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Objectives 


This book is designed as an introduction to the ideas and methods used to 
formulate mathematical models of physical processes in terms of random 
functions. The first five chapters use the historical development of the 
study of Brownian motion as their guiding narrative. The remaining 
chapters are devoted to methods of solution for stochastic models. The 
material is too much for a single course — chapters 1-4 along with chapters 
7 and 8 are ample for a senior undergraduate course offered to students 
with a suitably mathematical background (i.e. familiarity with most of 
the methods reviewed in Appendix B). For a graduate course, on the 
other hand, a quick review of the first three chapters, with a focus on 
later chapters should provide a good general introduction to numerical 
and analytic approximation methods in the solution of stochastic models. 

The content is primarily designed to develop mathematical methods 
useful in the study of stochastic processes. Nevertheless, an effort has been 
made to tie the derivations, whenever possible, to the underlying physical 
assumptions that gave rise to the mathematics. As a consequence, very 
little is said about It6 formula and associated methods of what has come 
to be called Stochastic Calculus. If that comes as a disappointment to the 
reader, I suggest they consider C. W. Gardiner’s book: 


e Handbook of stochastic methods (3rd Ed.), C. W. Gardiner (Springer, 
2004), 


as a friendly introduction to It6’s calculus. 

A list of references useful for further study appear at the beginning 
of some sections, and at the end of each chapter. These references are 
usually pedagogical texts or review articles, and are not meant to be an 
exhaustive tabulation of current results, but rather as a first step along 
the road of independent research in stochastic processes. A collection of 
exercises appear at the end of each chapter. Some of these are meant to 
focus the reader’s attention on a particular point of the analysis, others 
are meant to develop essential technical skills. None of the exercises are 
very difficult, and all of them should be attempted (or at least read). 

My hope is that the reader will find that the study of stochastic pro- 
cesses is not as difficult as it is sometimes made out to be. 
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1.1 Stochastic Processes in Science and En- 
gineering 


Physics is the study of collective phenomena arising from the interaction 
of many individual entities. Even a cannonball dropped from a high tower 
will collide with some 10°° gas molecules on its way down. Part of the 
miracle of physics is that, as a rule, only a few variables are required to 
capture the behaviour of the system, and this behaviour can, in turn, be 
described by very simple physical laws. In the case of the falling cannon- 
ball, for example, only its position and velocity are important. In hydro- 
dynamics, despite the incredible numbers of individual fluid molecules, 
the flow velocity, density and temperature are sufficient to describe the 
system under most circumstances. 

Yet the interactions that are eliminated from large-scale! models make 
themselves felt in other ways: a ball of crumpled paper experiences a 
strong drag force in flight due to air-resistance, a constitutive equation 
must be provided to fix the visco-elastic properties of a fluid, etc. Drag 
forces, viscosity, electrical resistance — these are all vestiges of the micro- 
scopic dynamics left behind in the macroscopic models when the enor- 
mous degrees of freedom in the original many-body problem were inte- 
grated away to leave a simple deterministic description. These shadows 


1Also called macroscopic or phenomenological models. 
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of microscopic motion are often called fluctuations or noise, and their 
description and characterization will be the focus of this course. 

Deterministic models (typically written in terms of systems of ordinary 
differential equations) have been very successfully applied to an endless 
variety of physical phenomena (e.g. Newton’s laws of motion); however, 
there are situations where deterministic laws alone are inadequate, par- 
ticularly as the number of interacting species becomes small and the in- 
dividual motion of microscopic bodies has observable consequences. 

The most famous example of observable fluctuations in a physical sys- 
tem is Brownian motion: the random, incessant meandering of a pollen 
grain suspended in a fluid, successfully described by Einstein in 1905. (We 
shall consider Brownian motion in greater detail throughout these notes.) 
The next major developments in the study of noise came in the 1950’s 
and 1960’s through the analysis of electrical circuits and radio wave prop- 
agation. Today, there is again renewed interest in the study of stochastic 
processes, particularly in the context of microfluidics, nanoscale devices 
and the study of genetic circuits that underlie cellular behaviour. 


Why study Brownian motion? 

Brownian motion will play a central role in the development of the 
ideas presented in these notes. This is partly historical because much 
of the mathematics of stochastic processes was developed in the context 
of studying Brownian motion, and partly pedagogical because Brownian 
motion provides a straightforward and concrete illustration of the mathe- 
matical formalism that will be developed. Furthermore, Brownian motion 
is a simple enough physical system that the limitations of the various as- 
sumptions employed in the modeling of physical phenomena are made 
obvious. 


1.2 Brownian motion 


Using a microscope, Robert Brown (1773-1858) observed and documented 
the motion of large pollen grains suspended in water. No matter what 
Brown did, the pollen grains moved incessantly in erratic and irregular 
motion (Figure 1.1). Brown tried different materials and different sol- 
vents, and still the motion of these particles continued. This was a time 
when most scientists did not believe in atoms or molecules, so the un- 
derlying mechanism responsible remained a mystery for nearly a century. 
In the words of S. G. Brush, “three quarters of a century of experiments 
produced almost no useful results in the understanding of Brownian mo- 
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Figure 1.1: Brownian motion A) Sample path of Brownian motion. 
B) Schematic of the physical mechanism underlying Brownian motion. 
Notice the figure is not to scale — In reality, there would be about 10° 
solvent molecules in the view shown, and each molecule would be about 


A 


1/100 mm in diameter. 


tion — simply because no theorist had told the experimentalists what to 
measure!” 
Thanks mainly to the work of Guoy [J. de Physique 7, 561 (1888)], 


several suggestive facts were known by century’s end: 


1. 


7. 


Such was the state of affairs when Einstein published his seminal pa- 
per in 1905 [Ann. Phys. 17, 549 (1905)], compelling him to write in the 
opening paragraph of that article, that “it is possible that the movements 


The motion of the particles is very irregular, and the path 
appears to have no tangent at any point. (On an observ- 
able scale, the path appears non-differentiable, though of 
course it is continuous.) 


. The particles appear to move independently of one an- 


other, even when they approach closely. 


. The molecular composition and mass density of the par- 


ticles has no effect. 


. As the solvent viscosity is decreased, the motion becomes 


more active. 


. As the particle radius is decreased, the motion becomes 


more active. 


. As the ambient temperature is increased, the motion be- 


comes more active. 


The motion never ceases. 
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Figure 1.2: Albert Einstein and Marian Smoluchowski 


to be discussed are identical with the so-called ‘Brownian molecular mo- 
tion’; however, the information available to me regarding the latter is so 
lacking in precision, that I can form no judgment in the matter.” 


1.2.1 Einstein 


Einstein’s argument is a classic — and it contains the seeds of all of the 
ideas developed in this course. To appreciate the revolutionary nature of 
Einstein’s formulation of the problem, it is important to recall that at the 
time many physicists did not believe in atoms or molecules. The argument 
is built upon two key hypotheses. First, that the motion of the particle 
is caused by many frequent impacts of the incessantly moving solvent 
molecules. Second, that the effect of the complicated solvent motion 
can be described probabilistically in terms of very frequent statistically 
independent collisions. 

Einstein’s approach follows two directions — the first is physical. Bal- 
ancing the osmotic pressure with the body forces acting on the particle, 
Einstein derives a partial differential equation for the particle density 


f(z, t), 


fa) 2 
OF gy OL 

Ot Ox? 
Here, the diffusion coefficient D is a measure of how quickly the density 
distribution will spread along the X-axis. For a spherical Brownian par- 
ticle of radius R, using Stokes’s theory for the drag force acting on the 
particle, Einstein arrives at the following expression for D, 

kpT 


D= =. (1.1) 
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Here v is the shear viscosity of the suspending liquid, T is the absolute 
temperature, and kg is Boltzmann’s constant. The units of kg are such 
that kpT has the units of energy. Knowledge of the value of kg is equiv- 
alent to the knowledge of Avogadro’s number, and hence of molecular 
size. Notice how the qualitative aspects of Brownian motion (items 4-6 
from the table on p. 3) are reflected in the expression for the diffusion 
coefficient. 

The second direction of Einstein’s paper is mathematical, using simple 
assumptions to derive a partial differential equation governing the proba- 
bility density of finding the Brownian particles at a given position x after 
a time ¢t — a derivation that will be echoed throughout this course. The 
original article is a wonderful piece of scientific writing, not only because 
of the historical significance, but because of the clarity and force of the 
central ideas. For that reason, the portion of the paper describing the 
derivation of the partial differential equation is reproduced below. 


Excerpt from Einstein’s paper on Brownian motion (Author’s 
translation) 


e A. Einstein (1905) Ann. Physik 17:549. 


It must clearly be assumed that each individual particle ex- 
ecutes a motion which is independent of the motions of all 
other particles; it will also be considered that the movements 
of one and the same particle in different time intervals are in- 
dependent processes, as long as these time intervals are not 
chosen too small. 


We introduce a time interval + into consideration, which is 
very small compared to the observable time intervals, but nev- 
ertheless so large that in two successive time intervals 7, the 
motions executed by the particle can be thought of as events 
which are independent of each other. 


Now let there be a total of n particles suspended in a liquid. In 
a time interval 7, the X-coordinates of the individual particles 
will increase by an amount A, where for each particle A has a 
different (positive or negative) value. There will be a certain 
frequency law for A; the number dn of the particles which 
experience a shift which is between A and A + dA will be 
expressible by an equation of the form 


dn = nd(A)dA, (1.2) 
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where ae 
/ ¢(A)dd =1, (1.3) 


and ¢ is only different from zero for very small values of A, 
and satisfies the condition 


o(A) = o(-A). (1.4) 


We now investigate how the diffusion coefficient depends on 
@. We shall once more restrict ourselves to the case where the 
number vy of particles per unit volume depends only on x and 
t. 


Let v = f(a,t) be the number of particles per unit volume. 
We compare the distribution of particles at the time t-+7 from 
the distribution at time t. From the definition of the function 
(A), it is easy to find the number of particles which at time 
t+7 are found between two planes perpendicular to the x-axis 
and passing through the points x and x + dz. One obtains? 


f(et+r)de=de [ f (a—A,t)¢(A)dA. (1.5) 


But since 7 is very small, we can set 


a) 
Flatts) = Fe trce, (1.6) 
Furthermore, we develop f(a — A,t) in powers of A: 
7 Of | A? 0? f 
f(a@—A,t) = f (2,t) Ax, + rage to (1.7) 


We can use this series under the integral, because only small 
values of A contribute to this equation. We obtain 


ec opens f 01 aH [ aota warts | So 
(1.8) 


?Einstein actually wrote f(a + A,t) in the integrand, which is incorrect (Why?). 
The sign has been corrected in all subsequent equations. In the case considered by 
Einstein, however, the collisions are unbiased: ¢(A) = ¢(—A), so the sign of A makes 
no difference whatsoever. 


Brownian motion 7 


Because ¢(A) = ¢(—A), the second, fourth, etc., terms on 
the right-hand side vanish, while out of the 1st, 3rd, 5th, etc., 
terms, each one is very small compared with the previous. We 
obtain from this equation, by taking into consideration 


i (A) dA = 1, (1.9) 

and setting 
1 fd 

and keeping only the 1st and third terms of the right-hand 
side, 

Of O* f 

— = D—>.... 1.11 

ot Ox? at) 


This is already known as the differential equation of diffusion 
and it can be seen that D is the diffusion coefficient. ... 


The problem, which corresponds to the problem of diffusion 
from a single point (neglecting the interaction between the 
diffusing particles), is now completely determined mathemat- 
ically; its solution is 


f (z,t) = (1.12) 


We now calculate, with the help of this equation, the dis- 
placement A, in the direction of the X-axis that a particle 
experiences on the average or, more exactly, the square root 
of the arithmetic mean of the square of the displacement in 
the direction of the X-axis; it is 


de = V(x?) = V2Dt. (1.13) 
(End of excerpt) 
The temperature T and the viscosity of the surrounding fluid, v, can 


be measured, and with care a suspension of spherical particles of fairly 
uniform radius R may be prepared. The diffusion coefficient D is then 
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available through the statistics of the Brownian particle (via Eq. 1.13). 
In this way, Boltzmann’s constant (or, equivalently Avogadro’s number) 
can be determined. This program was undertaken in a series of laborious 
experiments by Perrin and Chaudesaigues — resulting in a Nobel prize 
for Perrin in 1926. The coupling of Eq. 1.1 and Eq. 1.13 is one of the 
earliest examples of using fluctuations to quantify a physical parameter 
(as opposed to simply considering the averaged behaviour). Said another 
way, it is the irreproducibility in the experiment that is reproducible! 

Einstein’s argument does not give a dynamical theory of Brownian 
motion; it only determines the nature of the motion and the value of 
the diffusion coefficient on the basis of some assumptions. Smoluchowski, 
independently of Einstein, attempted a dynamical theory and arrived 
at Eq. 1.1 with an additional factor of 33 on the right-hand side (Sec- 
tion 1.2.2). Smoluchowski also lay the foundation for a more extensive 
study of Brownian motion, including the possibility of various forces act- 
ing upon the particle. Langevin provided another derivation of Eq. 1.1 
(Section 1.2.3) which was the starting point for the work of Ornstein and 
Uhlenbeck (Section 1.2.4). 

Einstein’s work was of great importance in physics, for it showed in a 
visible and concrete way that atoms were real. Quoting from Einstein’s 
autobiography: 


The agreement of these considerations with experience to- 
gether with Planck’s determination of the true molecular size 
from the law of radiation (for high temperature) convinced 
the skeptics, who were quite numerous at the time (Ostwald, 
Mach) of the reality of atoms. The antipathy of these schol- 
ars towards atomic theory can indubitably be traced back to 
their positivistic philosophical attitude. This is an interest- 
ing example of the fact that even scholars of audacious spirit 
and fine instinct can be obstructed in the interpretation of the 
facts by philosophical prejudices. 


Relation between Ejinstein’s derivation and the rest of the course 


Einstein’s work anticipated much of what was to come in the study of 
stochastic processes in physics. A few key steps will be highlighted, with 
reference to their appearance later in the course notes. Quoting from 
Gardiner’s Handbook, 


e The Chapman-Kolmogorov Equation (Eq. 1.5) states that the prob- 
ability of the particle being at point x at time t+ 7 is given by 
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the sum of probabilities of all possible “pushes” A from positions 
x—A, multiplied by the probability of being at «—A at time t. This 
assumption is based on the independence of the push A of any pre- 
vious history of the motion: it is only necessary to know the initial 
position of the particle at time ¢ — not at any previous time. This 
is the Markov postulate and the Chapman-Kolmogorov equation, of 
which Eq. 1.5 is a special form, is the central dynamical equation 
to all Markov processes. The Chapman-Kolmogorov equation and 
Markov processes will be studied in detail in Chapter 3. 


e The Fokker-Planck Equation: (Eq. 1.11) is the diffusion equation, a 
special case of the Fokker-Planck equation, which describes a large 
class of very interesting stochastic processes in which the system 
has a continuous sample path. In this case, that means that the 
pollen grain’s position, if thought of as obeying a probabilistic law 
given by solving the diffusion equation, Eq. 1.11, in which time ¢ is 
continuous (not discrete, as assumed by Einstein), can be written 
x(t), where x(t) is a continuous function of time — but a random 
function. This leads us to consider the possibility of describing the 
dynamics of the system in some direct probabilistic way, so that we 
would have a random or stochastic differential equation for the path. 
This procedure was initiated by Langevin with the famous equation 
that to this day bears his name. [see the following section]. We will 
discuss this in detail in at the end of Chapter 6. 


e The Kramers-Moyal and similar expansions are essentially the same 
as that used by Einstein to go from Eq. 1.5 to Eq. 1.11. The use 
of this type of approximation, which effectively replaces a process 
whose sample paths need not be continuous with one whose paths 
are continuous, has been a topic of discussion in the last decade. Its 
use and validity will be discussed in Chapters 6 and 4. 


e The Fluctuation-Dissipation Relation: Eqs. 1.1 and 1.13 connect the 
dissipation in the system with the magnitude of the fluctuations in 
thermal equilibrium (see Section 2.3.2 and Section 5.1, p. 105). 


1.2.2 Smolouchowski 
e M. von Smoluchowski (1906) Ann. Physik 21:756. 


Independently of Einstein, Smoluchowski developed a complementary 
model of Brownian motion. His original derivation has more in common 
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Figure 1.3: Schematic illustration of an unbiased random walk 
in one dimension. A particle at position x can move to neighbouring 
positions «+ Ax and «— Az with equal probability during each time-step 
(t,t+7). 


with Rayleigh’s study of Brownian motion (see Exercise 9 on p. 149), 
although the essence of his argument, and his legacy to the modeling 
of stochastic processes, can be appreciated by considering a spatially- 
discretized version of Eq. 1.5. 

In addition to a discretization of time (t,t + 7,t + 27, etc.), imagine 
the motion of the particle is likewise constrained to discrete positions 
...,@—Agx,x2,x+Az,... (Figure 1.3). To fully characterize the model, 
one assigns a starting position (generally x = 0 at t = 0 for convenience) 
and a transition probability between the lattice points. 

Smoluchowski’s model of Brownian motion assumes that a particle can 
move only one step to the right or to the left, and that these occur with 
equal probability. This type of process is called an unbiased random walk, 
an example of a more general class of one-step processes. Mathematically, 
the conditional probability density P(x, t) for the unbiased random walk 
obeys a difference equation, 


P(a,t+r)= : P(« — Az,t) 4 ; P(a+Az,t), P(x,0) = dz, (1.14) 


2 
where 6;; is the Kronecker delta. The two terms in the right-hand side of 
Eq. 1.14 represent the two ways in which a particle can move into position 
x during the interval (¢,t+7). Eq. 1.14 can be solved explicitly for the full 
time-dependent conditional distribution P(x,t) using transform methods 
(see Exercise 7 on p. 25, and Chapter 4). Furthermore, in a certain limit, 
the Smoluchowski model reduces to Einstein’s diffusion equation, Eq. 1.11 
(see Exercise 6 on p. 24). 

In many applications where x represents physical space, discretization 
is unnatural. In problems where the state space is inherently discrete, such 
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as molecule numbers or animal populations, Smoluchowski’s methodology 
is particularly convenient. It is not difficult to imagine whole classes of 
more complicated processes that can be constructed along similar lines 
by allowing, for example, multiple steps, biased transition rates, state- 
dependent transition rates, multidimensional lattices, etc. These more 
general models will be explored in more detail in Section 3.3. 


Relation between Smoluchowski’s formulation and the rest of 
the course 


e The master equation: Smoluchowski’s conceptual framework is ideal 
for many physical applications. Many physical systems, with an 
appropriately chosen coarse-graining of the observation time-scale, 
can be represented by Markov processes (Chapter 3). The main dy- 
namical equation for a Markov process is the Chapman-Kolmogorov 
equation (see p. 52). Under the same coarse-graining of the observa- 
tion time-scale that permits the physical process to be described as 
Markovian (see p. 59), the Chapman-Kolmogorov equation reduces 
to the more simple master equation. The random walk (Eq. 1.14) is 
an example of a master equation with discrete states. In practice, 
the physics of the process are captured by the transition probabili- 
ties among lattice sites, then the master equation is used to predict 
how the state evolves in time. The master equation is rarely solvable 
exactly, so a large portion of the course will be devoted to simulating 
or approximating the solutions. The master equation and Markov 
processes will be studied in detail in Chapter 3. 


1.2.3. Langevin 
e P. Langevin (1908) C. R. Acad. Sci. (Paris) 146:530. 


e English translation and commentary: D. S. Lemons and A. Gythiel (1997) Am. 
J. Phys. 65: 1079-1081. 


A few years after Einstein and Smoluchowski’s work, a new method 
for studying Brownian motion was developed by Paul Langevin. Langevin 
arrives at the same result for the mean-squared displacement as Einstein, 
although coming from a very different perspective and following [in his 
words] an “infinitely simpler” derivation. Beginning with the trajectory of 
a single particle, his work marks the first attempt to provide a dynamical 
theory of Brownian motion (to be contrasted with Einstein’s kinematical 
theory). Langevin’s argument is as follows. 
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Figure 1.4: Paul Langevin with Einstein 


A single Brownian particle is subject to Newton’s laws of motion. The 
Brownian particle moves around because it is subject to a resultant force 
F coming from the collision with the solvent molecules. With enough 
information about Ff’, Brownian motion can be studied using Newton’s 


second law, F' = ma, 
da 
m— = F. 1.15 
We (1.15) 
The precise, microscopic form of the force acting on the particle is un- 
known; however, from a macroscopic point of view, it is reasonable to 


consider it as the resultant of two different forces: 


1. Viscous drag fa = —y4 (y = 6vR) for a sphere of radius R in 
slow motion in a fluid with shear viscosity v (Stokes flow). 


2. Fluctuating (random) force /,(t) due to the incessant molecular 
impacts, about which next to nothing is known. 


Under these assumptions, the equation of motion for the Brownian 
particle is, 
d?x dx 
=-y— + f,(t), 1.16 
moe =-7Z + flO) (1.16) 


or, after multiplication with x and re-arrangement, 


m d? 


> qe it) — mv = 9G 2") + hd), (aa) 
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Langevin remarks that “since the quantity f,(t) has great irregularity” 
Eq. 1.17 is not a differential equation in the ordinary sense. In fact, since 
the forcing f,(t) is a random function, so, too, will be the path x(t). 
The path x(t) was not Lagevin’s objective, however. Assuming Eq. 1.17 
has a well-defined meaning in a statistical sense, he proceeded to average 
the equation over a large number of similar (but different) particles in 
Brownian motion under the same conditions. 

Denote this average by (---), and assume that it commutes with the 
derivative operator 4. then from Eq. 1.17, 


> qt”) — (mv*) = —5 (a) + (afr (t)). (1.18) 


From the equipartition of energy theorem derived in equilibrium statistical 
mechanics, Langevin was able to replace the average kinetic energy of the 
particle (mv?) by kgT, where kg is Boltzmann’s constant and T is the 
absolute temperature®. The cross-correlation (z f,.(t)) Langevin dismissed 
by saying “it can be set = 0, due to the great irregularity of f,.(t).” The 
resulting differential equation for the mean-squared displacement (x?) is 
then simply, 


a? d 
m5 (2") + 1 (e") = 2kpT. (1.19) 
Integrating once, 
d) » = 2kpT —Ot, _ ; _ 7 
gir) =—g toe™ (c = constant, 2 = *) (1.20) 


Langevin proceeds by noting that experimentally 2 ~ 10° Hz; so the ex- 
ponential may be dropped after a time period of about 1078s has elapsed. 
Another integration yields the final result, 


kpT 


(a?) — (42) = 2 = t = 2Dt, (1.21) 
Y 
if we identify the diffusion coefficient with Einstein’s result D = ‘et a 
kpT 
6rvR* 


It should be clear from Langevin’s equation that stochastic processes 
afford a tremendous simplification of the governing model. Instead of 


3The equipartition of energy theorem comes from classical statistical mechanics (in 
the absence of quantum effects). It states that, at thermal equilibrium, energy is shared 
equally among all forms and all degrees of freedom. For an ideal gas, the average kinetic 
energy in thermal equilibrium of each molecule is (3/2)kgT. In the z-component of 
the velocity, the energy is (1/2)kgT — this is the result used by Langevin. 
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including ~ 10? auxiliary variables describing the movement of the sol- 
vent molecules, Langevin includes their effect as the single random forcing 
function f,-(t), leading to a closed equation for the particle position x(¢) 
— Quoting van Kampen: 


Together with its surrounding fluid [the Brownian particle] 
constitutes a closed, isolated system. The “relevant” variable 
x is the position of the particle, which constitutes a stochas- 
tic process... [This] is due to the fact that the single variable 
x does not really obey a closed differential equation, but in- 
teracts with a host of fluid molecules... These variables are 
not visible in the equation for x but their effect shows up in 
the random Langevin force. Fluctuations in x are constantly 
being generated by the collisions. 


Notice, in retrospect, the similarity between ¢(A) of Einstein and f,(t) 
of Langevin — both characterize the fluctuations. Nevertheless, while (A) 
describes the effect of the force imparted by the collisions (a kinematic 
approach), f,(t) treats the force itself as a random variable (a dynamic 
approach). As a result, Einstein derives a partial differential equation 
governing the entire probability distribution, whereas Langevin derives an 
ordinary differential equation for the first two moments of the distribution. 

Several critical remarks can be made regarding Langevin’s derivation, 
and criticisms were not long in coming. 


1. What, precisely, is the function f,(t)? We know what a random 
variable is, and we know what a function is — but f,.(t) is a random 
function. What is the mathematical meaning of that statement? 


2. If f(t) is a random function, then x(t) should inherit some of that 
randomness. We assume x(t) obeys a differential equation, but how 
dz 


. . 2 . 
do we know the derivatives a and os even exist? 


3. What kind of averaging are we using in moving from Eq. 1.17 to 
Eq. 1.18? Are we picking some a priori probability distribution and 
computing the expectation? If so, how do we choose the probability 
distribution? Are we taking an arithmetical average? 


4. Setting Ce~® = 0 because 2 is large only holds if t is large. What 
is going on over very short timescales? 


Soon after the 1908 paper by Langevin, several outstanding contributions 
to the theory of Brownian motion appeared. Of particular interest is 
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Figure 1.5: Leonard Ornstein and George Uhlenbeck 


the 1930 paper by Uhlenbeck and Ornstein that sought to strengthen the 
foundations of Langevin’s approach. 


Relation between Langevin’s approach and the rest of the course 


e Random differential equations: Langevin’s formulation of stochastic 
processes in terms of Newton’s law’s of motion still appeals to many 
physicists. For systems governed by linear dynamics in the absence 
of fluctuations, Langevin’s approach is very intuitive and simple to 
formulate. For nonlinear systems, however, Langevin’s approach 
cannot be used. Differential equations with stochastic forcing or 
stochastic coefficients are the subject of Chapter 8. The difficulties 
of Langevin’s approach for nonlinear equations is discussed in N. G. 
van Kampen’s article ‘Fluctuations in Nonlinear Systems.’ 


e Stochastic analysis: The cavalier treatment of the analytic proper- 
ties of the random force f,(¢) and the resulting analytic properties 
of the stochastic process x(t) generated extensive activity by math- 
ematicians to solidify the foundations of stochastic calculus. Of 
particular note is the work of Kolmogorov (see p. 30), Doob (p. 20) 
and It6 (p. 300). 


1.2.4 Ornstein and Uhlenbeck 


The Ornstein-Uhlenbeck paper (G. E. Uhlenbeck and L. S. Ornstein 
(1930) Phys. Rev. 36:823) begins by reminding the reader that the 
central result of both Einstein and Langevin is the mean-squared dis- 
placement of the Brownian particle; Einstein obtained the result (¥ is the 
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friction coefficient), 


kpT 
(x?) =2Dt, D= care (1.22) 


kinematically, from the evolution equation of the probability distribution, 
while many workers preferred Langevin’s dynamical approach based upon 
Newton’s law, 


mat =-yu+f-(t)) uw=— (1.23) 


“natural” assumptions about the statistical properties 


along with some 
of f(t). 

Next, Ornstein and Uhlenbeck go on to observe that the Brownian mo- 
tion problem is still far from solved, and that the two following questions 
have yet to be answered: 


e Einstein, Smoluchowski and Langevin all assumed a time scale such 
that = 
t>— (1.24) 
Y 


Under what conditions is this assumption justified? 


e What changes must be made to the formulation if the Brownian 
particle is subject to an external field? 


The Ornstein and Uhlenbeck paper is devoted to answering these out- 
standing questions. Their paper, too, is a classic — We paraphrase it 
below, although the reader is encouraged to seek out the original. 


The problem is to determine the probability that a free particle 
in Brownian motion after the time ¢ has a velocity which lies 
between u and u+ du, when it started at t = 0 with velocity 
uo. 


They declare their intention of using Langevin’s equation, 


ot + Bu = Alt) B= = A(t) = = (1.25) 
whose solution is, 
t 
u = Ug Ft 4 FE / e?§ A (€) dé. (1.26) 


0 
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(By solution, they mean an integration of the Langevin equation. Is that 
sensible?) On taking the average over the initial ensemble, using the 
assumption that (A(t)) = 0 as Langevin did, 


(Wu = Uo 6. (1.27) 


The subscript means ‘average of all u that began with velocity uo at time 
t = 0.’ Squaring Eq. 1.26 and taking the average, 


(u?),, = uber?" 4 en26" ff eP(Er™ (4(€) A()) dé. (1.28) 
0 0 


To make progress, we now assume that, 


(A(t1)A(t2)) = d1(ti — ta) (1.29) 


where ¢; is a function with values close to zero everywhere except at 
the origin where it has a very sharp peak. Physically, this is justifiable 
on the grounds that the fluctuating function A(t) has values that are 
correlated only for very short time instants. As we will see in the coming 
chapters, ¢1 is called the correlation function or autocorrelation of A(t). 
The dependence on time difference only, t; — tz, is typical of a special 
class of stochastic processes. 
In Ornstein and Uhlenbeck’s own words: 


[W]e will naturally make the following assumptions. .. There 
will be correlation between the values of [the random forcing 
function A(¢t)] at different times t, and tz only when |t; — to| 
is very small. More explicitly we shall suppose that: 


(A(ti)A(t2)) = o(t1 — ta), 
where ¢(x) is a function with a very sharp maximum at x = 0. 
Making the change of variable €+7 = v and €—7 = w, the integration 


can be carried out, 


‘Cae = wae + oe (1 _ er Pt) ; T= / oi (w) dw, (1.30) 


(where the integration limits in the definition of 7, extend to too because 

¢ © 0 except near the origin). Recalling that the equipartition of energy 
theorem requires that, 

kpT 

lim (u?) = pte y 


t-0co m 


(1.31) 
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kpT 
aa; 80, 


we have — in that same limit — that 3B = 


20keT 
—e es (1.32) 
m 
and the final result for the mean-squared velocity of the Brownian particle 


is, 
(u),, = Wer? + 


If, in addition to Eq. 1.29, it is assumed that fluctuations A(t) are Gaus- 
sian random variables at every time t, it is then possible to show that the 
random quantity u— uo e~%! also obeys a Gaussian (normal) probability 
distribution given by, 


wes (1-778) . (1.33) 
m 


Pa m 
| ItkpT (1 — e~ 26) 


m (u- ue Pt)” 
2kpT (1 — e726t) 
(1.34) 
Having derived the velocity distribution, Ornstein and Uhlenbeck pro- 
ceed to calculate the distribution for the displacement. 


G (u, Uo, t) 


3 
exp 


The problem is to determine the probability that a free particle 
in Brownian motion which at t = 0 starts from 2 = xg with 
velocity uo will lie, after a time t, between x and «+ dz. It 
is clear that this probability will depend on s = x — Zo only, 
and on t. 


In analogy with the velocity, they begin with Eq. 1.26, 
t 
U = Uo ce eh | oA) dE, (1.35) 
0 
and integrate, with the result, 


t n 


v= 244 rae ere) jee? [etawa dn; (1.36) 
0 0 


integrating partially this may be written in the form, 


t 


- _ uo -pt)_1 ge f se 1 
s=2£—2 3 UA EP) ae ye A@ders f Ala (1.37) 
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Taking the mean gives, 


(a) (tee), (1.38) 
B 
By squaring, averaging, and computing the double integral as above, the 
mean-squared displacement is, 


2 
(ae). = oh a Yo (1 ang Pty? ee ai ( 3 4+ de—Ft eo FP) (1.39) 


where 7; is as above. This result was first obtained by Ornstein (1917). 
In the limit of long times, it goes over to Einstein’s result, 


2 T1 kpT 


For short times, on the other hand, we have, 
(s\u) ~ uot and (s”),,, ~ uat? ast — 0, (1.41) 


i.e. the motion is uniform with velocity uo. Taking a second average over 
uo (using the distribution derived above), and since (uj) = “2, we have, 


“mm ? 


(())=0 — ((s%)) = 2B E 


= 2 (Bt -1+e~*). (1.42) 


The calculation of higher powers proceeds similarly. In this way, one can 
show that the variable, 


S=s- a (Per); (1.43) 


is Gaussian distributed, too, with 


mp? 3 
IkpT (2Bt — 3 + 4e— Ft — e—28t) 


2 
mi (x t= 3 (1- ef) 
2kpT (26t — 3 + 4e—Ft — e—28t) |? 


F («,20,t) = | 


Xx exp 


(1.44) 


under the assumption that A(t) is Gaussian. The Ornstein-Uhlenbeck 
paper contains more than the results described here. For instance, they 
re-derive the equation using what is called the Fokker-Planck equation 
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(which we shall meet later, in Chapter 6). They also consider the effect 
of an external force, by solving the problem of an harmonically-bound 
particle, : 

d*x x 

Wee so +w°a = A(t). (1.45) 
Although the approach of Langevin was improved and expanded upon by 
Ornstein and Uhlenbeck, some more fundamental problems remained, as 
Doob pointed out in the introduction to his famous paper of 1942 (J. L. 


Doob (1942) Annals of Math. 43:351), 


The purpose of this paper is to apply the methods and results 
of modern probability theory to the analysis of the Ornstein- 
Uhlenbeck distribution, its properties, and its derivation... 
A stochastic differential equation is introduced in a rigorous 
fashion to give a precise meaning to the Langevin equation for 
the velocity function. This will avoid the usually embarrassing 
situation in which the Langevin equation, involving the second- 
derivative of x, is used to find a solution x(t) not having a 
second-derivative. 


Obviously there is something deep going on in terms of the differentiability 
and integrability of a stochastic process. In particular, to make sense of 
Doob’s statement, we need to know what is meant by differentiability in 
this context. That will be the focus of Chapter 7. 


Relation between Ornstein and Uhlenbeck’s work and the rest 
of the course 


e Fokker-Planck equation: In addition to clarifying the assumptions 
left implicit by Langevin, and permitting a generalization of Langevin’s 
approach, Ornstein and Uhlenbeck provided a mathematical con- 
nection between the kinematic approach of Einstein (using a partial 
differential equation to describe the probability distribution) and 
the dynamical approach of Langevin (using an ordinary differential 
equation with random forcing to describe a single trajectory). That 
connection will be made explicit in Chapter 6. 


1.3. Outline of the Book 


In broad strokes, the outline of book is as follows: First, the fundamental 
concepts are introduced — 
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Chapter 2 - Random processes: The ideas of classical probability 
theory (Appendix A) are extended from random variables to the concept 
of random functions. We focus on a particular class of stochastic pro- 
cesses — called stationary processes — that will be used throughout the 
course. Particular attention is paid to the second moment of a stochastic 
function evaluated at different times, called the correlation function. 


After the fundamentals have been established, we examine how stochastic 
processes are classified, with particular focus on the most common process 
appearing in the modeling of physical systems: The Markov process. 
Chapter 3 - Markov processes: The most commonly used stochastic 
descriptor with the property that the future state of a system is deter- 
mined by the present state, and not on any states in the past. This is 
a strong requirement, but it is often approximately satisfied by many 
physical systems and allows a useful coarse-graining of the microscopic 
dynamics. The evolution equation for the probability distribution of a 
Markov process is often called the master equation, a difficult equation to 
solve exactly. The remaining chapters are, to a large extent, methods of 
approximating the master equation so that it is more amenable to solu- 
tion. 


Once Markov processes and their evolution equations have been derived, 
we examine in more detail solution methods and the kinematic formula- 
tion of Einstein. 

Chapter 4 - Solution of the Master Equation: Briefly discuss exact 
solution methods. In most cases of interest, the master equation is too 
difficult to solve exactly. We examine the most popular approximation 
method — stochastic simulation. 

Chapter 5 - Perturbation Expansion of the Master Equation: 
The linear noise approximation of van Kampen is a robust and widely- 
applicable analytic approximation method for the solution of the Master 
equation. 

Chapter 6 - Fokker-Planck Equation: This is a partial differential 
diffusion equation with drift that governs the entire probability distribu- 
tion — a generalization of the diffusion equation used by Einstein in his 
study of Brownian motion. A popular analytic method of approximating 
the evolution of a Markov process. There are some ambiguities and es- 
sential difficulties associated with this method when applied to systems 
with nonlinear transition probabilities. 


Before returning to the dynamic (stochastic differential equation) meth- 


22 Applied stochastic processes 


ods of Langevin, Ornstein and Uhlenbeck, we must first formulate the 
calculus of random functions. 

Chapter 7 - Stochastic Analysis: We shall be interested in the mean- 
square limits of sequences as the foundation of continuity and the opera- 
tions of differentiation and integration. More refined ideas of integration 
will be required to provide a formal meaning to expressions involving 
white noise as the canonical noise source (Ité’s calculus). 

Chapter 8 - Random Differential Equations: The differential equation- 
based modeling of Langevin, Ornstein and Uhlenbeck has many appealing 
features. We examine generalizations of their analysis, with particular fo- 
cus on numerical simulation and analytic approximation of solutions. 


Combining the ideas of the previous chapters, we characterize the effect 
of fluctuations on the macroscopic behaviour of nonlinear systems. 
Chapter 9 - Macroscopic effects of noise: There are situations where 
stochastic models exhibit behaviour that is in sharp contrast with deter- 
ministic models of the same system. We examine several such cases, and 
the analytic tools that can be used to study these effects. 


Chapter 10 - Special Topics: Some additional topics are included 
which are beyond the scope of the course, though perhaps of interest to 
some readers. 


1.4 Suggested References 


Much of the content on Brownian motion comes from C. W. Gardiner’s 
book: 


e Handbook of stochastic methods (3rd Ed.), C. W. Gardiner (Springer, 
2004). 


Beyond the first chapter, it is difficult reading. As claimed, it is a hand- 
book, so it is a great starting point if you’re looking for specific informa- 
tion about a specific stochastic process. 


One of the finest books on stochastic processes is by N. G. van Kampen: 


e Stochastic processes in physics and chemistry (2nd Ed.), N. G. van 
Kampen (North-Holland, 2001). 


It, too, is difficult reading, but well-worth the effort. He doesn’t shy away 
from making emphatic and unambiguous statements that help in learning 
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the material. The exercises are excellent — difficult and rewarding. 


A good supplemental text is the statistical mechanics book by Reif: 


e Fundamentals of statistical and thermal physics, F. Reif (McGraw- 


Hill, 1965). 


Excercises 


1, 


Fill in the steps that connect Eq. 1.28 to Eq. 1.30. It may be helpful 
to treat @1(t1 —t2) in Eq. 1.29 as an idealized delta function 6(t;—t2) 
(see Section B.4.2). 


Harmonically-bound particle: Repeat Ornstein and Uhlenbeck’s 
analysis to calculate (x(t)) and (x?(t)) for the harmonically-bound 
particle, Eq. 1.45. 


Benveniste: Read the paper “Human basophil degranulation trig- 
gered by very dilute antiserum against IgE,” by Davenas et al. No- 
tice the disclaimer appended to the end of the article by the editor. 


Briefly (one or two sentences) summarize what the authors claim. 
Why does the editor say that ‘[t]here is no physical basis for such 
an activity?’ Specifically, if the results reported in this paper are 
correct, what parts of the course would need to be revised. 


Fick’s law: A substance with concentration c(x,t) undergoing dif- 
fusive transport in one-dimension obeys the diffusion equation: 


Ot Dae = Be 
to a good level of approximation. Here, the flux term J = —Dd0c/0x 
was discovered empirically by Fick (1855). At the time, the depen- 
dence of the flux upon the concentration gradient was interpreted 
as meaning particles bump into each other in locations of high con- 
centration and consequently move to areas of lower concentration. 
That interpretation persists in many textbooks. How does the phys- 
ical interpretation change given Einstein’s work on Brownian motion 
(1905)? 


Delbriick and Luria. Read S.E. Luria and M. Delbriick (1943) 
Mutations of bacteria from virus sensitivity to virus resistance. Ge- 
netics 28: 491-511. 


(1.46) 
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(a) 
(b) 


(c) 
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What is the problem Luria and Delbriick set out to study? 
How will they distinguish among the possible hypotheses? 


What relation can you see between this article and Einstein’s 
article on Brownian motion? What parameter in the work of 
Luria and Delbriick plays the role of Avogadro’s number in 
Einstein’s analysis? 

It was primarily this work that resulted in Luria and Delbriick 
winning the Nobel prize in 1969 (along with Herschey) — why 
do you think this work is so significant? 


6. Brownian motion — Smoluchowski’s model: Smoluchowski 
formulated Brownian motion as a random walk along a discrete lat- 


tice. 


In a certain limit, Smoluchowski’s formulation coincides with 


Einstein’s diffusion equation. 


(a) 


Suppose each step has length A and is taken after a time 7, 
with equal probability of moving to the left or to the right. 
Using a centered-difference scheme in A, find an expression 
for 


Ps 


where p(s) is the probability the particle, beginning at lattice 
site m = 0 at t = 0, is at lattice site m after s steps (m and s 
are integers, with s > 0). Taking the limit A,7 — 0 in such a 
way that, 

A2 

—=D, mA->x, sTt=t, 

2T 
show that the discrete random walk reduces to Einstein’s dif- 
fusion equation, 


OP 0?P 
Bor ee 


Suppose the movement is biased, with probability p = 4 + BA 
of moving to the left and g = 4 — BA of moving to the right 
after each step (where A must be small enough that ¢ > 0). 
Under the same limiting conditions as above, how does the 


partial differential equation for P(«,t) change? 
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Solve this equation for P(x,t) subject to P(#,t) > 0 and 
OP/0xz + 0 as x + ov, and a reflecting barrier at x = 0. 
That is, when the particle reaches the point x = 0 it must, in 
the next step, move A to the right. 


Suppose a state-dependent transition rate, so that the proba- 
bility of moving in either direction depends upon the position 
of the particle. More precisely, if the particle is at kA the 
probabilities of moving right or left are, 


eee 0 
2 Ry 2-3 R)’ 


respectively. R is a given integer, and possible positions of the 
particle are limited by the condition —R <k < R. Derive the 
difference equation for the probability p,,(s) for occupying the 
site m after s moves (if the initial site was m = 0). 

Under the same limiting conditions as above, and with R > 
oo and 1/Rr — 4, derive the partial differential equation for 
P(a,t). Solve for P(x,t), subject to the boundary conditions 
P(a,t) > 0 and 0P/0z > 0 as x > +00. 


7. One-dimensional random walk with hesitation: Consider a 
random walk along a discrete lattice with unit steps, with equal 
probabilities for moving to the right and to the left, given by p = 
q = Ar (so that the probability to remain on a site is r = 1 — 2Ar), 
where 7 is the time between consecutive steps and A is a given 
constant. 


(a) 


(b) 


Write a balance equation for the probability p,,(s) for occu- 
pying the site m after s moves (if the initial site was m = 0). 
Take the 7 — 0 limit in that equation to derive the following 
continuous-time equation, 


dpm(t) 
dt 


—) [(Pm—1(t) + Pm-+1(t) = 2pm(t)] ) 


where t = s7 is a finite observation time. (Note that through- 
out, m is an integer.) 


Show that, 
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where Q(z,t) is the probability-generating function, 


[oe) 


Qe) SS 2" pal); (1.47) 


m=—oo 
with z a complex number, and the integral ¢ running along 
the unit circle |z| = 1 in the complex plane. 


(c) Derive a differential equation for the probability-generating 
function and show that its solution, subject to the appropriate 
initial condition, is given by Q(z, t) = exp [(z +2°1- 2) At]. 


(d) Show that the solution of the equation from part 7a is given 
by, 
Dm(t) = eT iial (2At), 


where [,,(a) is a modified-Bessel function of integer order n. 


e) Deduce that the leading term of the asymptotic series for the 
& y 
probability density function from part 7d, when t,m — oo with 
m?/t fixed, has the form, 


1 2 


Note: In parts 7d and 7e, you may use any standard handbook 
of mathematical methods and formulas, such as the book by 
Abramowitz and Stegun. 


8. Monte Carlo simulation of random walks: Often various prop- 
erties of stochastic processes are not amenable to direct computation 
and simulation becomes the only option. Estimate the solution of 
the following problems by running a Monte Carlo simulation. 


(a) Consider a one-dimensional unbiased random walk, with a re- 
flecting barrier at « = 0 and an absorbing barrier at « = n 
(any particle that reaches « = n disappears with probability 
1). For a system that starts at x = 0, on average how many 
steps must it take before it is absorbed at x = n? Run your 
simulations for various n, and deduce a general formula. This 
is a simple enough example that the answer can be checked 
analytically — verify your formula obtained from simulation 
against the analytic solution. 
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A 


Figure 1.6: From a given vertex, the probability to move along any adja- 
cent strand is equal. If the fly can make it to one of the arrows, then it 
is free. If the spider catches the fly, then the fly is eaten. A. Spider web 
for Exercise 8(b)i, 8(b)ii and 8(b)iii. B. Spider web for Exercise 8(b)iv. 


(b) Consider a two-dimensional random walk along a spider web 
(Fig. 1.6). A fly lands on the web at some vertex. 


i. 


il. 


iil. 


Suppose the fly dies of fright, and the spider moves blindly 
along the web to find it. On average, how many steps must 
the spider take in order to find the fly? How does that 
number change as the number of radial vertices increases? 
(In Fig. 1.6, there are 3 radial vertices.) Can you imagine 
a mapping of this problem to a one-dimensional random 
walk? Verify your simulation results with an analytic es- 
timate. 

Suppose the web is very dusty so that the fly can easily 
move along every strand (for simplicity, assume the time 
to traverse along a strand is independent of the strand 
length). For a spider asleep at the center of the web, what 
is the probability of escape for a fly landing on any given 
vertex of the web? Compute the escape probability at ev- 
ery vertex for 3, 4, and 5 radial vertices. Hint: Use sym- 
metry arguments to cut down the number of simulations 
you must run. 

Now assume the spider can move, but slowly. Repeat 8(b)ii 
with a spider that moves 1 step for every 2 steps taken by 
the fly. 
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iv. Compute the escape probability for the fly that lands in 
the tattered web shown in Fig. 1.6B, if the spider is asleep 
at the center. Suppose the fly dies of fright, and the spider 
moves blindly along the web to find it. On average, how 
many steps must the spider take in order to find the fly? 


Suppose that in a coin-toss game with unit stakes, Albert bets 
on heads, and Paul bets on tails. The probability that Albert 
leads in 2r out of 2n tosses is called the lead probability P2;2n- 
For a coin tossed 20 times, what is the most likely number of 
tosses for which Albert is in the lead? Guess first, then run 
Monte Carlo simulations to make a table of Po;2, for n = 10 
and r= 0,1,2,...,10. Follow the convention that in the event 
of a tie, the previous leader retains the lead. 


CHAPTER 2 
Po | 


RANDOM PROCESSES 


By the time the Ornstein and Uhlenbeck paper appeared, it had become 
clear that Brownian motion was a phenomenon that could not be han- 
dled in a rigorous fashion by classical probability theory. For one thing, 
although Brownian motion is certainly a random affair, the variables that 
describe it are not the ordinary random variables of classical probability 
theory, but rather they are random functions. Roughly speaking, these 
are functions which are specified by the results of an observation, and 
which can take on different values when the observation is repeated many 
times. For another, although Wiener (1922) had put a special case of 
Brownian motion on a sound mathematical basis, it was clear that Brow- 
nian motion was just one example of this kind of random phenomenon 
(with many more cases popping in science and engineering), and it was 
not obvious whether Wiener’s methods could be generalized. 

A phenomenon similar to Brownian motion is that of fluctuations in 
electric circuits. Strictly speaking, the current through a conductor is 
always a random function of time, since the thermal motion of the elec- 
trons produces uncontrollable fluctuations (thermal “noise”). This kind 
of noise was becoming increasingly important in the 1920’s in electrical 
engineering due to the increasing importance of the vacuum tube — recall 
the pioneering work of van der Pol (1924) in this area. These vacuum 
tubes are always a source of considerable noise because of fluctuations in 
the number of electrons passing through the tube during identical time 
intervals (shot effect) and because of fluctuations in the cathode emission 
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Figure 2.1: Examples of Random Functions A) Height of a protein 
molecule above a catalytic surface. B) Fluctuations in the magnetic field 
around the North pole. C) Example of a random field — The cosmic 
background radiation. 


(flicker noise). In radio receivers, one not only observes noise arising in the 
electrical circuits themselves, but also random changes in the level of the 
received signals due to scattering of electromagnetic waves caused by in- 
homogeneities in the refractive index of the atmosphere (fading) and the 
influence of random electrical discharges (meteorological and industrial 
noise). 

Electrical noise, albeit very important, is far from a unique case. As 
other examples, we may cite the pressure, temperature and velocity vector 
of a fluid particle in a turbulent flow — in particular, at a point in the 
Earth’s atmosphere. These random functions depend on four variables 
(the three space variables and time): they are called random fields. The 
list goes on... 

To illustrate the appearance of the observed values of random func- 
tions, see Figure 2.1. Curves (and contours) of this type are obtained by 
experiment, that is, they are the result of observations. They represent 
realizations of the random functions; they are also called sample functions 
or sample paths. 

Their general appearance shows that a deterministic description of the 
phenomenon would be so complicated that the resulting mathematical 
model (if possible at all) would be practically impossible to solve. 


2.1 Random processes - Basic concepts 


Suppose we are given an experiment specified by its outcomes w € 0 (Q = 
the set of all possible outcomes), and by the probability of occurrence of 
certain subsets of 2. For example, in the tossing of a fair die, Q = 
{1,2,3,4,5,6}, the probability of each outcome is independent of the 
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previous toss and equal to P = A sample experiment might be 
w = {5,4,6, 4,3, 2}. 

To every outcome w we now assign — according to a certain rule — a 
function of time €(w,t), real or complex. We have thus created a family 
of functions, one for each w. This family is called a stochastic process 
(or a random function). Usually, t € R, although it could also be that 
t € [0,7]. 

A stochastic process is a function of two variables, w and t. There are 
two points of view: 


L 
q: 


1. Fix w, then €(w, t) = €“)(t) is a (real-say) function of time, depend- 
ing upon the parameter w. To each outcome w, there corresponds a 
function of t. This function is called a realization, or sample function 
of the stochastic process. 


2. Fix t, then €(w,t) = &(w) is a family of random variables depending 
upon the parameter t. 


In that way, a stochastic process can be regarded as either a family of 
realizations €)(t), or as a family of random variables £;(w). This may 
seem a pedantic distinction, but notice that Einstein’s point of view was to 
treat Brownian motion as a distribution of a random variable describing 
position (&(w)), while Langevin took the point of view that Newton’s 
law’s of motion apply to an individual realization (€“)(t)). 

Remarks: 


1. From a purely mathematical point of view, in order to specify a 
stochastic process we have to provide the probability (or probability 
density) of occurrence of the various realizations. This leads to the 
definition of a particular measure P (the probability measure) on 
the function space of realizations; by specifying this measure, we 
specify the stochastic process. 


This approach, originating in Kolmogorov’s work, has been mathe- 
matically very successful and fruitful; in particular, it can be shown 
to include as special cases all other ways of specifying a random 
process. Unfortunately, for us, it requires the use of advanced ideas 
from set and measure theory (see STAT 901/902); since the use of 
such techniques is not essential for our subsequent considerations, 
we shall develop the theory from a more physical point of view. 


2. We shall demand that, for a fixed t, &:(w) be a random variable in 
the classical sense. Moreover, we will denote the stochastic process 
by €(t), with the dependence on w understood. 
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Figure 2.2: Frequency interpretation of an ensemble of sample 
paths. 


3. Equality of two stochastic processes will mean their respective func- 
tions are identical for any outcome w); i.e. 


E(t) = C(t) means (wi) = Ce(w). 
Similarly, we define the operations of addition and multiplication. 


Because of the above interpretation, various probability concepts may 
be immediately generalized to stochastic processes. For example, if &(¢) 
is a real stochastic process, then its cumulative distribution function is 
given by, 


F(a;t) = P{&(t) < x} (2.1) 


(see Eq A.1 on page 257). The meaning of Eq. 2.1 is: Given x,t € R, 
F'(x;t) is the probability of the event {€(¢) < x} consisting of all outcomes 
w such that &(w) < x. More intuitively, we can adopt the frequency 
interpretation of Eq. 2.1: An experiment is repeated n times and a sample 
path observed each time (see Figure 2.2). Now given two numbers (x, t), 
we count the total number n;(x) of times that (at a given t), the ordinates 
of the observed functions do not exceed x; then F'(ax;t) = ey 

As in classical probability theory, the probability density correspond- 
ing to the cumulative distribution F(x; t) is given by, 

ee (a; t) 
f(a;t) = or; eae 


Next, given two instants ¢; and tg, consider the random variables £(t1) 
and €(t2) and define their joint cumulative distribution by, 


F(a1,%;t1, tz) = P{E(t1) < 21, (tz) < ra} (2.2) 
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with the joint probability density defined analogously, 


OF (21, 223 th, te) 
0x1 0x2 


f (#1, £93 t1, ta) = 
and so on for higher-order joint distributions. In general, 


F(a1,...,2njt1,---;tn) = P{E(t1) <a1,...,€(tn) < tn} 
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(2.3) 


(2.4) 


(2.5) 


Note that the n‘’-order distribution function determines all lower-order 
distribution functions, and in fact, the n“’-order distribution function 


completely determines the stochastic process. 
Remarks: 


1. The cumulative distribution function Eq. 2.4 is not arbitrary, but 


must satisfy the following conditions, 


e Symmetry condition. For every permutation (j1, jo,... 


(1,2,...,n), we must have 


Teh Corr Pe Canon Sie ee aoe = 


F(a1,%2,.--,2n3t1, ta,...,tn). 


:Jn) of 


This condition follows from the AND logic of the cumulative 
distribution function: the statements ‘A AND B AND C’ and 


‘A AND C AND B’ are equivalent. 


e Compatability condition. For m <n, we must have 


ECB yee 35 bins OOnees SOONG seat tines ta 


PGi, tipes spt): 


This statement again follows from the AND logic: the state- 
ments ‘A AND B AND C’ and ‘A AND TRUE AND C’ are 


equivalent. 


2. Kolmogorov (1931) proved that given any set of cumulative distri- 
bution functions satisfying the two conditions above, there exists 
a probability space 9 and a stochastic process €(w,t) such that 
F(a1,...,2njti,..-,tn) gives the joint cumulative distribution of 


Sts (w), +++ St, (w)- 
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As a further extension, we introduce the conditional probability den- 
sity, 


f (x2, talai, ti) = (2.6) 


f(@ijti) 
stele Sib yyctes gl 
CE RT | emery Mey ence ss itn). 
f(@1,.--,2e3t1,---, tk) 
In particular, we have, 
B15+++5 Ens b1,---sln) = Zn bln|En—-1,--+5>%1;ln-1,---,01 
F( t tn) = f(@n,tn| t t1)x 


Dea f (x2, te|a1, t1) x f(a1, ty). 
Intuitively, the relationship between the conditional and joint probabili- 
ties is very straightforward. Focusing upon the two variable case, 
f (x1, %23t1, te) = f(a, te|x1,t1) x f(r1,t1), 


simply means the probability that the state is x2 at time tg and x, at ty 
is equal to the probability that the state is x2 at time tz given that it was 
x, at ty multiplied by the probability that the state actually was x, at ¢,. 


2.1.1 Moments of a Stochastic Process 


We define the mean of the stochastic process by, 


wit) = (6) =f ef (ast) aes (2.7) 


the correlation function by, 


Co [oe) 
B(ty, tz) = (€(t1)E(t2)) = / / 4X2 f (21, £9; t1, te) dx dro. (2.8) 
—Co —CoO 
It will be useful in coming chapters to re-write the correlation function 
using the definition from the previous section — re-writing the joint dis- 
tribution using the conditional distribution, 


Btu, ta) = (€(t)6lta)) = ff area lf Castalia. th) x f (rsth)] dere 


—Co —CoO 


Co 


B(ti, t2) = (E(t )E(t2)) = / r1(£2)0, f (v1; t1) dai, (2.9) 


—oo 
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where (@2).~, is the conditional average of £2, 
Co 
(2) 04 =| tof (x2, t2|v1,t1) dre, 
—oo 


(conditioned so that (x2(t1)) = 71). 
Related to the correlation is the covariance, given by 


C(ti, to) = (ti) — m(tr)}  {E(t2) — w(te)}) = (E(t )E(t2)))s (2.10) 


and so on for higher-order correlations. The double-angled brackets de- 
note the cumulants of the process: Recall that for a scalar random variable 
€, the cumulants Km, of € are defined via a generating function, 


(et) = exp > ule om | 


m=1 


(see Eq. A.9 on page 263). Similarly, the cumulants of a stochastic func- 
tion &(t), denoted by Km = ((€™)), are given by 


(ents eat) = 


=aps = | ff (Ee) (2.11) 


m=1 ne 
The general rule for relating cumulants to the moments of €(t) is to 
partition the digits of the moments into all possible subsets; for each par- 
tition, one writes the product of cumulants, then adds all such products. 
The first few relations will make the prescription clear — Writing 1,2,... 


for &(t1), E(t2), Arar) 


2.2 Stationarity and the Ergodic Theorem 


For systems characterized by linear equations, there is a very convenient 
relationship between the probability distribution of the fluctuations in 
equilibrium and the parameters in the model governing the dynamics 
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called the ‘fluctuations-dissipation relation’. The fluctuation-dissipation 
relation underlies many results in stochastic processes (including Einstein 
and Langevin’s treatment of Brownian motion) and its derivation will be 
the focus of this section. As a prerequisite, we introduce a very useful 
notion — stationarity. 


Definition: The stochastic process €(t) is called stationary 
if all finite dimensional distribution functions defining €(t) re- 
main unchanged when the whole group of points is shifted 
along the time axis; i.e., if 


F(a1,2%2,...,¢njti +7, to +7,...,tn +7) = 
F(21,2%2,.--,%n3t1, ta,---,tn), (2.12) 


for any n,t,,t2,...,t, and 7. In particular, all one-dimensional 
cumulative distribution functions must be identical (7.e. F(x, t) = 
F(x) cannot depend upon t); all 2-dimensional cumulative dis- 
tribution functions can only depend upon |t; — t2|; and so on. 

It follows that the autocorrelation B depends upon time dif- 
ference only, 


(€(t)(s)) = B(t— s) = B(r). 


It turns out that there exists a whole class of phenomena where the 
underlying stochastic process is completely characterized by the mean 
(€(t)) = mw = constant, and by the correlation function B(r). Such 
stochastic processes are said to be stationary in the wide sense (or in 
Khinchin’s sense). It should be clear that, in general, Eq. 2.12 above 
may not hold for stochastic processes which are only stationary in the 
wide sense. 

A very useful property of stationary stochastic processes is that, under 
fairly weak conditions on the correlation time, they obey the so-called 
Ergodic Theorem (at least in the first moment). Consider a stochastic 
process and suppose we know how to compute mathematical expectations; 
the question arises as to how to compare mathematical averages with 
those obtained by experiment. 

Experimentally, we may proceed as follows: 


1. Take a large number N of records (realizations) of €(t), say, 


(yeaa remeet acs (a trores Sas 3) 
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2. Then we take the arithmetic mean, with respect to j, of €)(t) for 
every t and claim this equals (€(t)), 


th ee 
) DW 


3. Next, we take the arithmetic mean of the products € (t)€9) (s) for 
every pair (t,s), and claim that this equals (€(t)€(s)) = B(t, s); 


4. Proceed analogously with higher-order moments ... 


This program is fine in principal, but in practice taking all of these records 
and processing them as above is a very complicated affair — impossible to 
carry out in many cases. In fact, it is far easier to take a few long records 
of the phenomenon; in such a case, we cannot compute the arithmetic 
mean, but rather we can compute the time averages of the quantities of 
interest, €.g., 


(E(0) n= m7 foe 


(E(t)E(t+7)) & jim 7 feo (t+7)d 


and so on for higher-order correlations. Here, T is a finite, but very large 
time. In general, this time-average will not equal the mathematical expec- 
tation; however, if the process is stationary and the correlation function 
decays to zero, then the ergodic theorem guarantees that they coincide. 


Slutsky’s theorem: If €(t) is a stationary stochastic process 
(with ¢ a continuous variable) and, 


T 
sim, [ (eoE(m)par =0, 
0 
then 
1 T 
(() = jim Ff ena, 
1 
(HE t+7)) = jim = f EW EE+ r)at.ete 
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Figure 2.3: A long sample of a stationary process can be used in 
place of many (shorter) repeated samples. 


The sufficient condition to guarantee this mean-ergodicity is quite weak: 
if, 
lim ((€(0)&(r))) = 0, 


TCO 


that is sufficient to conclude that the stationary (possibly in the wide- 
sense) process €(t) is mean-ergodic. For a proof of Slutsky’s theorem, or 
the sufficiency condition, see Papoulis “Probability, Random Variables, 
and Stochastic Processes.” 

A nice pictorial way of thinking about the content of the ergodic the- 
orem is the following. Say we have a single, very long record €*(t) of 
our stochastic process €(t) — something like Figure 2.3. Now suppose 
this single record is cut into a large number N of pieces of length T; 
since the process is stationary, each €) (t) may be regarded as a separate 
measurement, and the arithmetic average can be performed, 


(E(t) E(t+7)) = B(r) & = EM (HEM 40), 


and so on. The ergodic theorem says that in the limit N — oo, the 
time-averages and the arithmetic averages coincide. 
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In practice, since €(t) cannot (in general) be expressed by a closed for- 
mula, the calculation of the averages is performed as follows. The record 
&*(t) is digitized and the time integrals replaced by their approximating 
sums, 


i / gee iq 
[sax Zde(ez), No, 
0 


k=1 


so that, 


1 N 
(§ (t)) © oe (kA), (2.18) 


where A is a small time interval, and N is chosen to make NA = T 
sufficiently large. Usually, A is chosen in such a way that the function 
&*(t) does not change appreciably during A, while the number N should 
be so large that a further increase has a negligible effect on the value 
computed via Eq. 2.13. 


2.3. Spectral Density and Correlation Func- 
tions 

In practice, it is often much easier to measure the Fourier transform of a 

signal rather than trying to compute the autocorrelation function directly. 


The two representations are closely related, as we shall now show. First, 
we define, 


T 
X(w) = i, E(t)e dt. 


Then, the fluctuation spectrum, or spectral density of the fluctuations, is 
defined as, 


: iL n 2 
Sw) = Jim = | X(w)P. 


After some manipulation, the connection between the (time-averaged) au- 
tocorrelation function and the fluctuation spectrum can be made explicit, 


T00 


T T-T 
S(w) = lim 2 cos(wrT) zh cote +t], 
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or, 


S(w) = =f cos(wT)B(r)dr. 


Written another way, 


i eT B(r)dr => Bir) =a eT S(w)dw. (2.14) 


—oo —oo 


i 


On 


S(w) 


The conjugal relationship between the autocorrelation function B(7) and 
the fluctuation spectrum S(w) is called the Wiener-Khinchin theorem (see 
Fig. 2.4). Furthermore, for a stationary process €(t), the time-average 
may be replaced by the ensemble average via the ergodic theorem, and 
we write, 


(f(H)E(é+ 7)) = B(r). 


Using the Fourier transform of €(¢), 


Sw) =f ewemar, 


—oo 


assuming (&(t)) = 0, we have the following, 


(Elw)) = = f e(O)er*"ae =o, 
(E(w) E(w’) = 6(w — w’)S(w). (2.15) 


In the next chapter, we use the fluctuation spectrum S(w) to derive the 
very useful fluctuation-dissipation relation. 
2.3.1 Linear Systems 


For linear systems with stochastic forcing n(¢), 


d” d®-1 


the correlation function of the process y(£) is related in a very simple way 
to the correlation of the forcing 7(t). To show this relationship explicitly, 
we begin with the function H(w) which is called the transfer function of 
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Figure 2.4: Correlation functions and spectra A) Example correla- 
tion functions. B) The corresponding fluctuation spectra. As the correla- 
tion time decreases, the autocorrelation becomes more narrowly peaked, 
while the spectrum broadens — this is a general feature of correlation 
functions and spectra, implied by the Wiener-Khinchin theorem. 


the linear operator acting on y(t) and is given by the Fourier transform 
of the solution A(t) of the auxiliary equation, 


qd” qd? 1 


an dt” + On—1 din a | ao h(t) = 6(t), 


(where 6(t) is the Dirac-delta function and h(t) is often called the impulse 
response of the system). Writing H(w) explicitly, 


H(w) = [an (iw)” + Qn—1 (iw)! +... + a0] 


The utility of the impulse response A(t) is that the particular solution 
of an inhomogeneous differential equation can always be written as a 
convolution of the forcing function with h(t). Eq. 2.16, for example, has 
the formal solution, 


co 


y(t) = 7 n(t — 7’)h(r')dr’. 


—oo 
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If we multiply (t+ 7) by y*(t) (where the * denotes the complex conju- 
gate) and take the average, then 


Co 


(nt + r)y*(t)) = i (n(t + 7)n*(t — 7'))h*(r")dr’. 


After a change of variables, and a Fourier transform, 

Sny(w) = Sryn(w)H*(w). (2.17) 
In a similar way, we also have, 

Syy(w) = H(w) Shy (w). (2.18) 


Combining Eqs. 2.17 and 2.18, yields the fundamental relation, 


Syy(w) = Snn(w) Hw) H* (w) = Sin (w) x |H(w)|? |. (2.19) 


As an example, consider a linear system with damping driven by 
a stochastic function with zero mean and a delta-correlation function, 


(n(t)n(t!)) = Tot — €/), 


oY - By + n(t), (2.20) 


dt 
the impulse response H(w) is simply H(w) = 1/(iw + 8), so that 


Sin (w) 1 r 


Syy(w) = 4 BP Oe GBP (2.21) 


Taking the inverse Fourier transform, 
e7 Alt 
28 


Eq. 2.20 is Langevin’s equation for the velocity of a Brownian particle, so 
(2.22) is the autocorrelation function for the Ornstein-Uhlenbeck process 
at steady-state. 


(y@y(t—7)) =P 


(2.22) 


2.3.2 Fluctuation-Dissipation Relation 


For linear systems, we can obtain the relation between the correlation 
function of the process y(t) and the correlation function of the forc- 
ing function (t) by directly calculating the autocorrelation of y(t). If 
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this is combined with the fundamental result above, we obtain the so- 
called fluctuation-dissipation relation, relating the noise magnitude [ to 
the other physical parameters in the model. We can write a generic ver- 
sion of Langevin’s equation as the following phenomenological damping 
law governing the variable y, subject to random forcing 7(t): 

dy 

—, = —by + n(t), (2.23) 

dt 
where 6 is the damping constant and the random forcing 7(t) has the 
statistical properties, 


(n(t)) = 0, (2.24) 


and, 


(n(t)n(t')) = P(t — #’). (2.25) 


(Compare with Eq. 1.29 on page 17.) Noise with a Dirac-delta correla- 
tion function is called white noise because one can easily show that the 
fluctuation spectrum of the Dirac-delta is a constant; i.e. all frequencies 
are equally represented in the spectrum, in analogy with white light that 
contains all frequencies of visible light (see Eq. B.36 on p. 293). For the 
linear system (2.23), it is possible to compute the fluctuation spectrum of 
y without using Eq. 2.25! To that end, note that the general solution of 
Eq. 2.23 is 


t 
y(t) = yor? +f eT BOE) nt! dt! (2.26) 
0 


Taking the average, conditioned by the initial value y(0) = yo, 


WO) = mec, (2.27) 


since the average of 7(t) vanishes by Eq. 2.24. Calculating the autocorre- 
lation function, 


(y(0)(y(#))yo)°4 = (y?)4e™, (2.28) 
where (y?)°! is given by the variance of the equilibrium distribution, as 
calculated by equilibrium statistical mechanics (equipartition of energy). 
The spectral density is simply, 


Syyw) = 2 (y?)02 (2.29) 


T B? + .w2" 
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Notice that up to this point, Eq. 2.25 has not been used. From the 
fundamental relation above (Eq. 2.19), 


(8? +.w*) Syy(w) = Sin(w). (2.30) 
Now using Eq. 2.25 and Eq. 2.30, we arrive at, 
P = 2(y2)°%B, (2.31) 


relating magnitude of the fluctuations ([) to the magnitude of the phe- 
nomenological dissipation (8). This is a statement of the fluctuation- 
dissipation relation. It says that in order to acheive thermal equilibrium, 
the diffusive effects of the fluctuations must be balanced by the dissipative 
effects of the drag. A version of this relation was used by Einstein in his 
study of Brownian motion to relate the diffusion coefficient D to the drag 
force experienced by the particle, thereby obtaining an explicit relation 
between the mean-squared displacement and Boltzmann’s constant (cf. 
Eq. 1.13 on page 7). The fluctuation-dissipation relation was later used 
by Nyquist and Johnson in the study of thermal noise in a resistor. That 
work is described in more detail below and provides another example of 
using the fluctuations to quantify phenomenological parameters (in this 
case, again, Boltzmann’s constant). 


2.3.3. Johnson and Nyquist 


e J.B. Johnson (1928) “Thermal agitation of electricity in conductors,” Physical 
Review 32: 97-109. 


e H. Nyquist (1928) “Thermal agitation of electric charge in conductors,” Physical 
Review 32: 110-113. 


In 1928, J. B. Johnson observed random fluctuations of current through 
resistors of various materials. Most importantly, he observed that the 
power in the fluctuations scaled linearily with temperature. In an arti- 
cle immediately following, Nyquist provided a theoretical framework for 
Johnson’s observations. His main tools were, like Langevin, the equipar- 
tition of energy theorem from equilibrium statistical mechanics and the 
fluctuation-dissipation theorem. The consequence of Nyquist’s analysis 
was that the linear dependence of the fluctuation power on the ambi- 
ent temperature yields Boltzmann’s constant (which is proportional to 
the slope of the line). We shall consider their work in more detail below, 
starting with the work of Nyquist and applying the fluctuation-dissipation 
relation to a simple RC-circuit (Figure 2.5). 
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Figure 2.5: Johnson-Nyquist circuit. Capacitance C shorted through 
a resistance R at temperature T. The current I(t) is dQ/dt. Redrawn 
from D. Lemons “An introduction to stochastic processes in physics,” 
(John Hopkins University Press, 2002). 


Nyquist uses essentially a completely verbal argument based on equi- 
librium thermodynamics to arrive at the following conclusions: 


e To a very good approximation, the fluctuations should 
be uniformly distributed among all frequencies (i.e., the 
noise is white). 

e At equilibrium, the average energy in each degree of free- 
dom will be kgT, where kg is Boltzmann’s constant and 
T is the absolute temperature of the system. Of that en- 
ergy, one half is magnetic and the other half is electric. 


We consider a specific circuit — A charged capacitor C' discharging 
through a resistance R, all at a fixed temperature T (Figure 2.5). In the 
absence of fluctuations, Kirchoff’s law gives, 


Q 
IR+—3 =0, 
C 
where I is the current in the circuit and Q is the charge on the capacitor. 
Johnson observed variability in the current, represented by a white noise 
source 7(t). Writing I = dQ/dt leads to the Langevin equation 


~ = Qtn(t). (2.32) 


This equation has the canonical form discussed in the previous section, 
with damping coefficient 6 = 1/RC. From the equipartition of energy, 
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Figure 2.6: Johnson’s measurement of (v?)/R as a function of T. 
From the fluctuation-dissipation relation (Eq. 2.34), the slope is propor- 
tional to Boltzmann’s constant. Copied from Figure 6 of Johnson’s paper. 


we know that the mean electrical energy stored in the capacitor is, 


(Q*) _ keT 
IC — 2? 


or, (Q?) = CkpT. From the fluctuation-dissipation relation (Eq. 2.31), 
we immediately have the variance of the fluctuations I, 
2kpT 


(P) = 1 =2(Q%) p= 2. (2.33) 


This result is more commonly written as a fluctuating voltage v applied 
to the circuit. By Ohm’s law, we have, V = IR, so that, 


(v*) = (1?) - R? = 2RkeQ, (2.34) 


called Nyquist’s theorem. Notice (v?)/R should be linear in T — that is 
precisely what Johnson observed (Figure 2.6). Using his setup, Johnson 
obtained an averaged estimate for Boltzmann’s constant of 1.27 + 0.17 x 
10-23 J/K, as compared to the currently accepted value of 1.38 x 10773 
J/K. 
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Suggested References 


Two superb books dealing with correlation functions and spectra from an 
applied perspective are, 


e Probability, random variables, and stochastic processes (2nd Ed.), 
A. Papoulis (McGraw-Hill, 1984). 


e Introduction to random processes with applications to signals and 


systems (2nd Ed.), W. A. Gardner (McGraw-Hill, 1990). 


The second book (by Gardner) is particularly recommended for electrical 
engineering students or those wishing to use stochastic processes to model 
signal propagation and electrical networks. 


Exercises 


1. White noise: It is often useful, in cases where the time-scale of 
the fluctuations is much shorter than any characteristic time scale in 
the system of interest, to represent the fluctuations as white noise. 


(a) Show that for 6-correlated fluctuations, B(r) = d(r), the fluc- 
tuation spectrum is constant. Such a process is called white- 
noise since the spectrum contains all frequencies (in analogy 
with white light). 


(b) Assume € has units such that (€7) o Energy, then, 


S(w)dw = E(w ,we), 


w1 


is the energy lying between frequencies w; and w2. For a white- 
noise process, show that the energy content of the fluctuations 
is infinite! That is to say, no physical process can be exactly 
described by white noise. 


2. Linear systems: Systems such as Langevin’s model of Brownian 
motion, with linear dynamic equations, filter fluctuations in a par- 
ticularly simple fashion. 


(a) Fill in the details to derive Eqs. 2.17 and 2.18. 
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(b) For systems characterized by several variables, the correlation 
function is generalized to the correlation matrix, defined either 
by the dot product: (y(t). y(t’)7) = C(t,t’) (where 7 is the 
transpose), or element-wise: (y;(t)y;(t’)) = Cij(t,t’). For a 
stationary process, C;,;(t,t’) = Ci;(t— t’), use the same line of 
reasoning as above to generalize Eq. 2.19 to higher-dimensions, 
and show that, 


S,,(w) = H(w) - §,,(w) -H7(—w) |, (2.35) 


where S,,(w) is the matrix of the spectra of C;,;(t—t’) and H 
is the matrix Fourier transform. 


3. MacDonald’s theorem: Let Y(t) be a fluctuating current at equi- 
librium (i.e. Y(t) is aes It is often easier to measure the 
transported charge Z(t = [ey Y(t’)dt’. Show that the spectral den- 
sity of Y, Syy(w), is a to ie transported charge fluctuations 
by MacDonald’s theorem, 


Syy(w)= 2 aie (2? (t)) dt. 
0 


Hint: First show (d/dt)(Z?(t)) = =o, jdt’. 


4. Let the stochastic process X(t) be defined by X(t) = Acos(wt) + 
Bsin(wt), where w is constant and A and B are random variables. 
Show that, 


(a) If (A) = (B) = (AB) = 0, and (A?) = (B?), then X(t) is 
stationary in the wide-sense. 


(b) If the joint-probability density function for A and B has the 
form fap(a,b) = h (Va? + 62), then X(t) is stationary in the 
strict sense. 


5. Integrated Ornstein-Uhlenbeck process: Define Z(t =, Y (t’)dt’ 
(t > 0), where Y(t) is an Ornstein-Uhlenbeck process ie 
e-lr| 
(Y(t)) =0 and (Y(t)Y(t—7)) = LaF 


Z(t) is Gaussian, but neither stationary or Markovian. 
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(a) Find (2(t1)Z(t2)). 


(b) Calculate (cos [Z(t,) — Z(t2)]). It may be helpful to consider 
the cumulant expansion of the exponential of a random process, 
Eq. 2.11 on page 35. 


6. Reciprocal spreading and the Wiener-Khinchin theorem: 
Suppose g(t) is any non-periodic, real-valued function with Fourier 
transform G(w) and finite energy: 


1 co 


~ On 


W = g2(t)dt 'Gw) Paes 66: 


—oco 


For the following, it may be useful to use the Cauchy-Schwarz in- 
equality (p. 296) and consider the integral, 


a d 
/ ig(t) eat 


(a) Writing the ‘uncertainty’ in time as oz, 


o, = / (2 oO a, 


—Co 


and the ‘uncertainty’ in frequency as o,,, 


* 9 IG)? 
— 7 em 
Ow Vf aw dw, 


show that 


01° Ow> =. 


2 


9 


(b) For what function g(t) does 0; +o, = 3% 


CHAPTER 3 
| | 


MARKOV PROCESSES 


3.1 Classification of Stochastic Processes 


Because the n*”-order distribution function F(21,@2,...,2njt1,t2,...,tn), 
or density f(@1,2,...,Unjt1,te,..-.,tn), completely determines a stochas- 
tic process, this leads to a natural classification system. Assuming the 
time ordering ty < tg < ... < tn, we then identify several examples of 
stochastic processes: 


1. Purely Random Process: Successive values of €(t) are statisti- 
cally independent, i.e. 


f(@1,.--,En3ti,---,tn) = fei, ti): f(@2, te) +++ Fen, tn); 


in other words, all the information about the process is contained in 
the 1°*-order density. Obviously, if the n*”-order density factorizes, 
then all lower-order densities do as well, e.g. from 


f (v1, £2, £33 t, ta, ts) — f (a1, t1) 4 f (2, te) : f (x3, ts), 
it follows that, 
f (a1, 23 t1, t2) = f(a1, ta) - f (x2, te). 


However, the converse is not true. 
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2. Markov Process: Defined by the fact that the conditional proba- 
bility density enjoys the property, 


flan, tn|©1, sey Un—-13 ti, be tie1) = P(tistal|@eeasthea)s (3.1) 


That is, the conditional probability density at t,, given the value at 
Ln—1 at ty_1, is not affected by the values at earlier times. In this 
sense, the process is “without memory.” 


A Markov process is fully determined by the two functions f (#1, t1) 
and f(x2,t2|v1,t1); the whole hierarchy can be reconstructed from 
them. For example, with t; < tz < ts, 


F (wis 29523)t1)t2,t3) = 
F (#35 t3|%1, 29; t1, to) ~ f (xe, t2|41, t1) + f (21,41). 
But, 


f (x3, t3|r1, 22; t1, t2) = f (xs, t3|x2, ta), 
by the Markov property, so, 


f (£1, ©2, 03; t1,t2,t3) = f (x3, t3|"2, te) - f (x2, te|a1, ti) - f (#1, tr). 
(3.2) 


The algorithm can be continued. This property makes Markov pro- 
cesses manageable, and in many applications (for example Einstein’s 
study of Brownian motion), this property is approximately satisfied 
by the process over the coarse-groaned time scale (see p. 59). 


Notice, too, the analogy with ordinary differential equations. Here, 
we have a process f(%1,%2,03,...3t1,t2,t3,...) with a propagator 
f (wi4i, ti41|ei, ti) carrying the system forward in time, beginning 
with the initial distribution f(a21,t,). This viewpoint is developed 
in detail in the book by D. T. Gillespie (1992) Markov Processes. 


3. Progressively more complicated processes may be defined in a simi- 
lar way, although usually very little is known about them. In some 
cases, however, it is possible to add a set of auxiliary variables to 
generate an augmented system that obeys the Markov property. 
See N. G. van Kampen (1998) Remarks on non-Markov processes. 
Brazilian Journal of Physics 28:90-96. 


WARNING: In the physical literature, the adjective Markovian is used 
with ‘regrettable looseness’ as van Kampen says. The term seems to have 
a magical appeal, which invites its use in an intuitive sense not covered 
by the definition. In particular: 
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e When a physicist talks about a “process,” a certain phenomenon 
involving time is usually what is being referred to. It is meaningless 
to say a “phenomenon” is Markovian (or not) unless one specifies 
the variables (especially the time scale) to be used for its description. 


e Eq. 3.1 is a condition on all the probability densities; one simply 
cannot say that a process is Markovian if only information about the 
first few of them is available. On the other hand, if one knows the 
process is Markovian, then of course f (#1, ¢1) and f (#2, te|x1, t1) do 
suffice to specify the entire process. 


3.2  Chapman-Kolmogorov Equation 


First some remarks about terminology before deriving this fundamental 
result. 


1. We shall call those variables appearing on the left (right) of the con- 
ditional line in a conditional probability distribution the left vari- 
ables (right variables). For example, 


f L1,€2, 03,04, U5 | LG, U7, Lg, L9, X10 
Ne a Se 


Left Variables Right Variables 


Sometimes one wants to remove a left (or right) variable; this can 
be done according to the following rules, 


e To remove a number of left variables, simply integrate with 
respect to them, 


f (ai\a3) = J f eucalea) deo 


—oco 


e To remove a number of right variables, multiply by their con- 
ditional density with respect to the remaining right variables 
and integrate, 


f (a1|@4) = i! fi f (v1|v2, £3, v4) f (%2, £3|"4) dxedz3. 


—oo —o0oO 


Markov processes 53 


2. From now on, the ensemble average (or E{ - }) will be denoted by 


( a -), 
(x(t) = / © ee ie. 


—oCo 


3. Recall the definition of the distribution function of n random vari- 
ables &), €2,...,€n: 


F(a1,...,2n) = P{&1 < 41,...,E < In}. 


This, of course, is the joint distribution function, and the joint den- 
sity function is obtained by differentiation with respect to {x1,...,2n}. 
Obviously, 


f(a1,---,%n) > 9, 


Co 


F(0o,...,00) = fi / f(v1,...,%n)dx1...daty, =1. 


—oco 


If we substitute in the distribution function certain variables by oo, 
we obtain the joint distribution function of the remaining variables; 
e.g. 


F (21,23) = F(a, 00, £3, 00). 


If we integrate the joint density function f(x1,...,2%,) with respect 
to certain variables, we obtain the joint density of the remaining 
variables (called the marginal density); e.g. 


f (#1, 23) = / J fev astas ts) dade 


—OCo —CcoO 


We now proceed with the derivation of the Chapman-Kolmogorov 
equation. As we have seen, a Markov process is fully determined by the 
two functions f(a1,t1) and f(#2, te|a1,t1). This, however, does not mean 
that these two functions can be chosen arbitrarily, for they must also obey 
two important identities. The first one follows from the definition of the 
conditional density, 


f (a1, £2; t1, ta) = f(a, te|v1,t1) + f(a, t1), (3.3) 
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(where f (x2, t2|r1,t1) can be thought of as the transition probability for 
reasons that will become clear below when we discuss the master equa- 
tion). By integration over 71, we immediately have, 


f (a2, t2) = [ff eotaleasty) feast) de. (3.4) 


—oo 


The second identity is obtained from Eq. 3.2, 
f (x1, £2, 35 t1, ta,t3) = f (x3, ts|v2, te) - f(v2, talai, ti): f(v1,t1) (Eq. 3.2). 


With t; < tg < t3, integration over x2 gives, 


f (v1, #3; t1,t3) = f (1,41) if f (x3, t3|@2, te) f (2, te|ai, ti) dra, 


—oCo 


and, using f(r, 73; t1,t3) = f(x3,t3|r1,t1) - f(r1,t1) (Eq. 3.3), 


f (23, t3|£1, t1) = / f (x3, t3|@2, te) f (x2, te|a1, t1) dre, (3.5) 


—Co 


which is known as the Chapman-Kolmogorov Equation (Figure 3.1). It isa 
functional equation relating all conditional probability densities f(x;,t;|x;,t;) 
for a Markov process, where the time ordering in the integrand is essential. 
The converse is also true: if f(a1,t1) and f(x2,tg|v1,t1) obey the con- 
sistency condition, Eqs. 3.4 and 3.5, then they uniquely define a Markov 
process. 

Remarks: 


1. The Chapman-Kolmogorov equation is a functional equation for the 
transition probability f(2;,t;|v;,t,;); its solution would give us a 
complete description of any Markov process — Unfortunately, no 
general solution to this equation is known. 


2. From the meaning of f(x, t|xo, to), it is clear that we must have, 
f(a, t|20, to) — 6(a — xo) as t > to. 


For example, an important Markov process is the one whose tran- 
sition probability is normal; i.e. the Gaussian, 


f (x2, ta|z1,41) = 


wo ca 
4D (tz — t) 4 D(t2-t1) | - 
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x 


Figure 3.1: Chapman-Kolmogorv Equation. The intermediate vari- 
able x2 is integrated over to provide a connection between x; and 23. 


Convince yourself that it obeys the Chapman-Kolmogorov equation; 
and — if one chooses f (21,0) = 6(x1) and uses Eq. 2 — it follows from 
Eq. 3.4 that, 


1 | 1 x? 
exp 
VADrt 4 Dt 
This is the probability density of the so-called Wiener-Lévy pro- 
cess. It is a Markov process, but despite appearances, it is not a 

stationary stochastic process, because 


f(a,t) = ,(t > 0). (3.6) 


(€(t1)&(t2)) = 2D min(t1, ta), 


(which is not a function of time-difference only — Exercise 1). Notice, 
however, that the transition probability f (a2, te|a1,t1) does depend 
on (#2 — 41,t2 —t1) only. This process describes the position of a 
Brownian particle according to Einstein. 


3. For a stationary Markov process, it is convenient to use a special 
notation. We let, 


f (x2, tel|ai,t1) = p(x2|21,7), T=t2—-t, 


in terms of which the Chapman-Kolmogorov equation reads, 


Co 


See if peewee, “Os 


—oco 
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where 7’ = t3— tg, and (7’,T) > 0 because of the time-ordering. We 
also have, 

CO 


/ p(#2\a1,7) dxg = 1, 


—oco 


Co 


/ p(x2\x1,T) f(r1)dxi = f (x2), 


—oo 
co co 
Bor) = ff erzap (oalea,r) fer) dtrde, 
—Co —CcoO 
the latter applying when the stationary Markov process has zero 
mean. 
The Ornstein-Uhlenbeck process is the best-known example of a 
stationary Markov process; it is defined by: 


1 et 
Ho = eam | wim 
: ex ! (x2 — ee~P")" 
Jm(b/B) (1 e287) | (D/B) (1— e287) 
(3.8) 


p (a2|21,7) = 


This process describes the velocity of a Brownian particle according 
to Ornstein and Uhlenbeck. 


It is straight-forward to verify that this process satisfies the Chapman- 
Kolmogorov equation, and that the correlation function is exponen- 
tial (Exercise 2), 


B(r) x e871, 


The Ornstein-Uhlenbeck process is stationary, Gaussian and Marko- 
vian; Doob (1942) proved that it is “essentially” the only process 
with these three properties. 
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3.3. Master Equation 


Aside from providing a consistency check, the real importance of the 
Chapman-Kolmogorov equation is that it enables us to build up the con- 
ditional probability densities over the “long” time interval (t),t3) from 
those over the “short” intervals (t),t2) and (t2,t3). It turns out this is 
an incredibly useful property. In physical applications, we generally have 
well-developed theories describing in detail (microscopically) the evolu- 
tion of a system out of equilibrium. These theories are formulated in 
terms of differential equations describing the trajectories of the many 
(~ 107°) particles constituting the system — such as Hamilton’s equations 
in classical mechanics or Schrédinger’s equation in quantum mechanics. 
These descriptions are determinisitic, and if we could solve the initial 
value problems for them, we would not need to think of approximations 
such as the splitting of time scales mentioned above, for we would have 
the solution for all time. [Even in that case, however, we would have a 
big puzzle on our hands; for while our fundamental microscopic theories 
are time-reversal invariant, such symmetry is lost at the macroscopic level 
and irreversibility dominates the evolution of physical phenomena. How 
does this irreversibility come about? This is one of the classic problems of 
statistical mechanics (see M. C. Mackey, Time’s Arrow: The Origins of 
Thermodynamic Behavior, (Dover, 2003), for an interesting discussion).] 

There are several methods in physics that provide the microscopic 
dynamics over very short time scales, generating the transition proba- 
bility between two states during the time interval At, as At > 0 (e.g., 
time-dependent perturbation theory in quantum mechanics). This is not 
enough, of course, for we need to know the evolution of the system on a 
time scale of the order of the time it takes to perform an experiment. 

It is here, namely in bridging these two time scales, that the Markovian 
assumption helps: from our knowledge of the transition probability at 
small times we can build up our knowledge of the transition probability 
at all time iteratively from the Chapman-Kolmogorov equation. 

For a large class of systems, it is possible to show that over very short 
time, the transition probability is, 


p(2|z,7') = (1 — aot’) 6(a — z) + 7’w(a2]z) + o(7’), (3.9) 


where w(2z|z) is the transition probability per unit time, and ag is the 
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zero'’-jump moment, 


Co 


ao(2) =f w(elz) ax. (3.10) 


—oo 


The physical content of Eq. 3.9 is straightforward — it simply says that: 
the probability that a transition (z + x) occurs + the probability that no 
transition occurs during that time (i.e. z = x) = the transition probability 
of moving from z to x during time 7’. 

This will be the case in systems where the fluctuations arise from 
approximating a discrete stochastic process by a continuous deterministic 
model — for example, models of chemical reaction kinetics, predator-prey 
dynamics, particle collisions, radioactive decay, etc.... amy model where 
it is the particulate nature of the variables that necessitates a stochastic 
formulation. 

Substitute Eq. 3.9 into the Chapman-Kolmogorov equation, 


p(a3|@1,7 +7'/) = 


co 


i [(1 — ao (@2) 7’) 6 (a3 — 22) + 7'w (x3|2)] p (w2|21,7) dre 
= / (1 — ao (a2) 7’) 6 (x3 — 22) p(w2|21, 7) dxgt+ 
+7’ / w (#3|%2) p (2|"1,7) dxe 
= (1 — ao (a3) 7’) p (x3|21,7) +7’ / w (#3|t2) p (2|"1,7) dre. 


Re-arranging, dividing by 7’, and recalling the definition of ao (cf. Eq. 3.10), 
we write 


Co 
v3\21,T +7’) — p(x3|01,T 
pole TAT) pleat) | e2lea) poser, 7) dee 
—oo 


Co 


+f w(calta)p (wales, dee 


—oco 
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Passing to the limit rT’ > 0, we have!, 
[oe) 


0 
9, P (eslt1,7) = / [w (x3|r2) p (x2|"1,7) — w (xa|r3) p (x3|a1,7)] da, 


(3.11) 


which is generally called the Master Equation. Note that it is a conser- 
vation equation of the gain-loss type, and there is of course a discrete 
version, 


Son (t) =D fttnmPm (t) = WmnPn (B)) (3.12) 


dt 
m 
where the label n refers to the possible states of the stochastic process €(t). 
The transition probabilities wy, denote the probability of a transition 
from state m to state n in a small time increment dt. 

The big difference between the Chapman-Kolmogorov equation and 
the master equation is that the Chapman-Kolmogorov equation is a non- 
linear equation (in the transition probabilities) that expresses the Markov 
character of the process, but containing no information about any partic- 
ular Markov process. In the master equation, by contrast, one considers 
the transition probability at short times, w(a,;|z;), as a given function 
determined by the specific physical system, and the resulting equation is 
linear in the conditional probability density which determines the (meso- 
scopic) state of that system. The derivation and utility of the master 
equation formalism is best appreciated by example. We first consider 
the simplest master equation, the birth-and-death process, which is how 
Smoluchowski formulated his study of Brownian motion. 


3.4 Stosszahlansatz 


At this point, it is well worth taking a step back and examining the 
various assumptions that have been made so far (Figure 6.3). We be- 
gin with a general functional equation expressing a consistency condition 
for Markov processes (the Chapman-Kolmogorov equation). Assuming 
the transition probabilities are stationary, we are able to express the 


1This limit is taken in an asymptotic sense: +’ is much smaller than the charac- 
teristic time scale over which p(x3|x1,7) varies, yet it is much larger than the time 
required for all of the microscopic variables in the system to relax to their equilibrium 
distribution; see Section 3.4 on page 59. 
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Chapman-Kolmogorov equation as a discrete-differential equation (the 
master equation) that turns out to be far more amenable to analysis. 
The mathematical derivation is straightforward, but it is important that 
the underlying physical assumptions are clear. 

The master equation rests upon the assumption that on the time scale 
At over which the observable state evolves, all of the microscopic auxiliary 
variables (for example, the motion of the solvent molecules colliding with 
a Brownian particle) assume their stationary equilibrium distributions. 
Furthermore, the equilibrium distribution of the microscopic variables 
at time t + At depends only upon the state of the system at time t. 
Thus at every time step At, the microscopic variables are perturbed from 
equilibrium, then rapidly relax to their new equilibrium distribution. This 
is called the repeated randomness assumption, or the Stosszahlansatz. 

What is essential to bear in mind is that the limit At > 0 used in the 
derivation of the master equation is not a true limit in the mathematical 
sense (cf. Eq. 3.11 on page 59) — Although At > 0, we must remember 
that At is still long enough on the microscopic scale that the microscopic 
variables are allowed to relax to their equilibrium state so that the mi- 
croscopic state is (approximately) independent at times t and t+ At. A 
more detailed description of the physical derivation of the master equa- 
tion is provided by van Kampen in “Fluctuations in Nonlinear Systems,” 
appended to these notes, and quoted in part below: 


We are concerned with systems that consist of a very large 
number NV of particles. In classical theory, the precise mi- 
croscopic state of the system is described by 6N variables 
L1,+++,L3N5P1,++-,P3n- They obey the 6N microscopic equa- 
tions of motion. The gross, macroscopic aspect of the state is 
described by a much smaller number of variables Q1,...,Qn, 
which are functions of 71,...,p3. For convenience we sup- 
pose that apart from the energy there is just one other Q, 
and drop the label. Experience tells us the remarkable fact 
that this macroscopic variable Q (x1,...,p 3,7) obeys again a 
differential equation 


Q = F(Q), (3.13) 


which permits to uniquely determine its future values from its 
value at some initial instant. The phenomenological law (3.13) 
is not a purely mathematical consequence of the microscopic 
equations of motion. The reason why it exists can be roughly 
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understood as follows. Using the equations of motion one has 


Q= y (ia tae 1) So try. sPaw): 


OPK 


The variables in g may be expressed in @ and the energy 
(which we do not write explicitly), and 6N — 2 remaining 
variables, J) (%1,...,P 3x) say. Hence 


Q =f (Q;%1,...,0 ew—2)- 


This may also be written 


t+At 
Qe+ay- a= [Faw you) a. 


Now suppose that Q(t) varies much more slowly than the 0) 
(which is the reason it is microscopic). It is then possible to 
pick At such that Q(t) does not vary much during At, while 
the 0) practically run through all their possible values (ergodic 
theorem with fixed value for Q). Hence one may substitute in 
the integral Q(t) for Q(t’) and replace the time integration by 
an average over that part of the phase space that corresponds 
to given values of the energy and Q: 


Q(t + At) — Q(t) = At: (f[QM); Mow = At- FIQ()]: 


It should be emphasized that this implies that at each time t 
the J) vary in a sufficiently random way to justify the use of 
a phase space average (“repeated randomness assumption” ). 


Fluctuations arise from the fact that, in the relevant part of 
phase space, f is not exactly equal to its average F’, but has 
a probability distribution around it. Hence Q(t + At) is no 
longer uniquely determined by Q(t), but instead there exists a 
transition probability W(q’|q). More precisely, At W(q'|q) dq’ 
is the probability that, if Q has the value q at time t, the value 
of Q(t + At) will lie between gq’ and q’ + dq’. The probability 
distribution P(q,t) of Q at any time t then obeys the rate 
equation 


aE oP tat) 2 f¢wala’) (qlq’)P(q',t) —W(q'|q)P(q,t)}dq’. (3.14) 
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Figure 3.2: Schematic illustration of a birth-death process 


This is the general form of the master equation ... Again a 
repeated randomness assumption is involved, namely that at 
each time the are sufficiently random to justify the identi- 
fication of probability with measure in phase space. 


—N. G. van Kampen, Fluctuations in Nonlinear Systems. 


In the following examples, it is the derivation of the master equation 
that is emphasized; discussion of actual solution methods is postponed to 
Chapter 4 — but by all means, read ahead if you’re curious. 


3.5 Example — One-step processes 


Many stochastic processes are of a special type called one-step process, 
birth-and-death process or generation-recombination process. They are 
continuous-time Markov processes whose range consists of the integers n, 
and whose transition probability per unit time (7.e. Wm) permits only 
jumps between adjacent sites, 


Wnm = T'mOn,m—1 a ImOn,m+15 (m a n) 
Wnn = 1- (Tn + Gn); 


where ry, is the probability per unit time that, being at site n, a jump 
occurs to site n — 1. Conversely, g,, is the probability per unit time that, 
being at site n, a jump occurs to site n + 1 (Figure 3.2). 

Smoluchowski applied this model to the study of Brownian motion 
by setting the generation and recombination probabilities to 1/2: gn = 
Tn = 1/2. Defining the probability density ppjm,s as the probability that 
a random walker beginning at n at time t = 0 will be at site m after s 
steps, the master equation for this unbounded random walk is, 


1 1 
Pn\|m,s+1 = 5 Pn\|m-1,s =P 5 Pn|m+1,s- (3.15) 
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Let v = |m—n| be the net distance traveled by the random walker, then 
one can show that pyjm,s is given by, 
1 s! 
eee} Y Xs andv+s even, 
Cee. en oe) (3.16) 
0; otherwise. 


It is possible to show that this discrete random walk formulation is equiv- 
alent with Einstein’s diffusion representation of free Brownian motion 
(Excercise 6a). 

Note that it is often the case in literature that the authors will not 
explicitly write the conditioning of the distribution by the initial state, 
preferring instead the short-hand: ppjm,s = Pm,s OF Pm(s). One must 
always remember, however, that any distribution governed by a master 
equation (or any other dynamical equation) is necessarily conditioned by 
the initial distribution. 
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Figure 3.3: Bernoulli’s urn model. A) Two urns, A and B, each 
contain n balls, with n of the 2n balls white and n black. A ball is drawn 
at random from each urn and then the ball that came from A is placed 
in urn B, and the ball that came from B is placed in urn A. B) Laplace’s 
approximation of the fraction of white balls in urn A (@ = a/n) after a 
very long time. The distribution is Gaussian, with standard deviation 


1/V8n. 


3.6 Example — Bernoulli’s Urns and Recur- 
rance 
e@ D. Bernoulli (1770) “Disquitiones analyticae de novo problemate conjecturali,” 


Novi Commentarii Academiae Scientiarum Imperialis Petropolitanae XIV: 3- 
25. 


e P. S. Laplace (1812) Théorie Analytique des Probabilités. Livre II. Théorie 
Générale des Probabilités, Chapter 3. 


e M. Jacobsen (1996) “Laplace and the origin of the Ornstein-Uhlenbeck process,” 
Bernoulli 2: 271-286. 


The following problem was posed by Bernoulli (see Figure 3.3): 


There are n white balls and n black balls in two urns, and 
each urn contains n balls. The balls are moved cyclically, one 
by one, from one urn to another. What is the probability zz, 
that after r cycles, urn A will contain x white balls? 


In attempting to solve this problem, Laplace derived the following differ- 
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ence equation for the evolution of zz, — in words, 


Prob. x white balls 
in urn A after = 
r+ 1 cycles 
Prob. of Prob. of 
drawing a drawing a 
white ball black ball 
from urn A} [from urn B 
Prob. of Prob. of 
drawing a drawing a 
black ball white ball (3.17) 
from urn A} [from urn B 
Prob. of Prob. of 
drawing a drawing a 
white ball white ball 
from urn A} [from urn B 
Prob. of Prob. of 
drawing a drawing a 
black ball black ball | ’ 


from urn A] [from urn B 


Prob. x +1 white balls 
in urn A after 


r cycles 


Prob. x — 1 white balls 
+ in urn A after 


r cycles 


Prob. x white balls 
+ in urn A after 


r cycles 


Prob. x white balls 
+ in urn A after 
r cycles 


where the first term represents the loss of a white ball from urn A, the 
second term represents the gain of a white ball to urn A, and the remain- 
ing terms contribute no change in the number of white balls in urn A. 
The difference equation for this process is (much) more concisely written 
as, 


ete eee 
Za,r+1 = (= ) Zotar +2 — (i- ) Zap + (1- — ) Ze—1,r+ 


Notice that in contrast to the master equation for the simple birth-death 
process described in the preceding section, Eq. 3.18 has nonlinear tran- 
sition rates. If the transition rates are nonlinear, it is very difficult to 
find an explicit solution for the probability z,,,, so Laplace sought an ap- 
proximate solution instead. His approach will be described in some detail 
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because the method is essentially the same as van Kampen’s linear noise 
approximation, which we will meet again in Section 5.1. 

First, Laplace assumes the number of balls n is large, so that changes 
xz+1 are almost infinitesimal with respect to the density of white balls 
x/n, allowing the differences to be replaced with differentials, 


O 1 0 
Zetl yr © Zar = Aa Zan,r F 9 Oa Zur 


0 
Za rti © Zar + Za,r- (3.19) 
Or 


Second, Laplace makes a change of variables, r = nr’, and 


1 
a= 5n +n p; (3.20) 


we shall see in Section 5.1 that this change of variables is tantamount 
to assuming that the density of white balls, x/n, is distributed about 
the “deterministic” value 1/2, with some “fluctuations” parameterized 
by yw, and that the magnitude of these fluctuations scale with ,/n. With 
Eq. 3.20, the nonlinear transition rates can be expanded in powers of 1/n, 
which, along with Eq. 3.19, leads to a linear partial differential equation 
governing the distribution of uw, U(u,7r), 


r! 10/1 
CT ao F tur) +5 ga (Ur). 20 
This is an example of a Fokker-Planck equation (cf. Chapter 6; Eq. 6.28), 
and in particular, it characterizes what is called an Ornstein- Uhlenbeck 
process which we met in Chapter 1 with Ornstein and Uhlenbeck’s study 
of the velocity of a Brownian particle (cf. Section 1.2.4). Assuming that n 
is large enough that the range of yz is unbounded (jy € (—oo, 00)), one can 
easily verify that the steady-state solution of Eq. 3.21 is the Gaussian?, 


lim U(u,r!) =U(u) = — 


r!—0o Wea 


Be (3.22) 


Reverting to the original variables by writing a continuous probability 
density p(x/n) for the fraction of white balls in urn A (4 = a/n), and 


using Eq. 3.22, 
n iF 
p(&) = 2 (2 exp 4n (: ) | ; (3.23) 
7 2 


?The connection between the work of Ornstein and Uhlenbeck becomes apparent if 


one makes the substitution Eat = 8 in Eq. 1.34. 
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which is a Gaussian distribution, with mean 1/2 and variance 1/8n. As 
the number of balls increases (n —> co), the density becomes more nar- 
rowly peaked at & = 1/2, approaching a delta-function. In Section 4.1.2 
(on page 84), we shall see that Laplace’s approximation is extremely good 
all the way down to n = 10. 

A model very similar to Bernoulli’s urn model was proposed at the be- 
ginning of the 20°” century by Paul and Tatiana Ehrenfest to clarify some 
foundational concerns that continued to undermine Boltzmann’s formu- 
lation of statistical mechanics (see Excercise 7). The trouble lay with the 
precise meaning of irreversibility in thermodynamics. For example, a hot 
liquid will spontaneously, and irreversibly, lose heat to the surrounding 
environment until the temperature of the liquid is in equilibrium with 
the surroundings. Yet, if we treat the system as a microscopic dynamical 
system, Poincaré proved that almost every state of the system will be 
revisited eventually, to an arbitrarily prescribed degree of accuracy (this 
is Poincaré’s Wiederkehrsatz). Zermelo argued that the irreversibility 
of thermodynamics and the recurrence properties of dynamical systems 
are incompatible. Boltzmann replied that Poincaré’s Wiederkehrsatz is a 
mathematical result, true in the limit of infinite time, but that in prac- 
tice the time-span between the recurrance of an unlikely state is unimag- 
inably long, and so the statistical mechanical processes underlying ther- 
modynamics are irreversible for all intents and purposes. One can make 
Boltzmann’s argument quantitative using the following simple urn model, 


Imagine 2R balls, numbered consecutively from 1 to 2R, dis- 
tributed in two boxes (I and II), so that at the beginning 
there are R+n (—R <n < R) balls in box I. We choose a 
random integer between 1 and 2R (all integers are supposed to 
be equiprobable) and move the ball, whose number has been 
drawn, from the box in which it is to the other box. This pro- 
cess is repeated s times and we ask for the probability Qr+m,s 
that after s drawings there will be R + m balls in box I. 


M. Kac proves in his classic paper on Brownian motion (M. Kac (1947) 
“Random walk and the theory of Brownian motion,” The American Math- 
ematical Monthly 54: 369-391) that irrespective of how the initial state 
is prepared, every possible state is visited with probability one — this is 
Poincaré’s Wiederkehrsatz. Nevertheless, Kac also shows that excursions 
from equilibrium return exponentially quickly (Newton’s law of cooling) 
and that if the number of balls is large and the initial state is far from 
equilibrium, then the recurrence time is enormously long. Specifically, 
the number of draws, 5;¢cy,, required on average for the recurrance of a 
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Figure 3.4: Noise in chemical reaction kinetics. 


state beginning with R+ 7 balls in one urn is given by, 


_ (R+n)\(R=0)! oon 
Srecur = (2R)! . 


If, for example, we begin with R = 10000 and n = 10000 (i.e. all 20000 
balls in Urn I), and each drawing takes 1 second, then on average we will 
have to wait more than 10°°°° years (!) for this state to re-occur. On the 
other hand, close to equilibrium, neighbouring states are visited often: if 
we begin with R = 10000 and n = 0 (i.e. half of the 20000 balls in Urn J), 
then on average we must wait about 100./z ~ 175 seconds for the state 
to re-occur. 


3.7 Example — Chemical Kinetics 


e@ D. McQuarrie (1967) “Stochastic approaches to chemical kinetics,” Journal of 
Applied Probability 4: 413. 


e J. Elf and M. Ehrenberg (2003) “Fast evaluation of fluctuations in biochemical 


networks with the linear noise approximation,” Genome Research 13: 2475. 


e D. T. Gillespie (1977) “Exact simulation of coupled chemical reactions,” Journal 
of Chemical Physics 81: 2340. 


Very often, the kinetics of chemical reactions are described in terms of 
deterministic chemical rate equations, which take the form of a system 
of coupled nonlinear differential equations. Underlying that formulation 
is the implicit assumption that the concentration of the reactants varies 
both continuously and differentiably. For moles of reactants (7.e. molecule 
numbers of the order 107%), these assumptions are perfectly justified since 
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a change of one or two molecules in a population of 107° is, for all in- 
tents and purposes, infinitesimal. That accounts for the great success of 
deterministic models in most macroscopic systems, including freshman 
chemistry labs (Figure 3.4). For small pools of reactants, however, the 
mathematical formulation becomes more delicate. 

Inside living cells, reactant numbers tend to be of the order 1-1000. A 
reaction altering the population by one or two therefore generates a large 
relative change, and the molecule numbers no longer evolve differentiably 
(Figure 3.5). Furthermore, reactions no longer occur ‘continuously’ over 
an infinitely small time interval, but rather progress in a series of steps 
of finite time width. By way of analogy, one can imagine the national 
birth rate as compared to the chances my next-door neighbor will have 
a baby. One often hears statements such as: “Every 10 minutes, a baby 
is born in this country.” That clearly cannot be true of my next-door 
neighbor. Evolution of the population of an entire country can be well- 
described using differential equations, but the evolution of the population 
of a small town occurs in a stochastic, step-wise manner. This example 
illustrates the dichotomy of discrete evolution of individuals on the one 
hand, and (nearly) continuous evolution of the population density on the 
other. We can make this relationship explicit by writing the number of 
individuals n as being proportional to a density X; with the constant of 
proportionality Q being a measure of the system size, 


n=Q-X. 


In the urn example above, 2 was the total number of balls in one urn. 
In chemical reaction dynamics, 2 is usually the volume of the reaction 
vessel. 

It is straightforward to argue that in the gas phase (or a very dilute 
solution), the dynamics of chemical reaction networks can be reasonably 
described by a master equation governing the probability density for the 
molecule numbers n, 


EL t) 
pete) = Lowa Pin t) — Wp P(n,t). (3.24) 


As above, Wnn’ denotes the transition probability from the state n’ to 
the state n. In contrast with examples studied so far, P(n,t) is usually a 
multivariate probability distribution, depending upon state variables for 
all the products and reactants of interest in the network. The transition 
probabilities w;; are generalizations of the deterministic reaction rates, 
and in fact we shall find that the coefficients appearing in the deterministic 
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a) b) Cc) 
f &~10 
[Cc] [Cc] 
/ Cc ~ 100 ee 
/ c ~ 1000 
time time 


Figure 3.5: Noise in chemical reaction kinetics. a) For many reac- 
tant molecules, the species concentrations evolve both continuously and 
differentiably. b) When small numbers of reactants are involved, due to 
the probabilistic nature of individual reaction events and the finite change 
in molecule numbers incurred, the concentration evolves step-wise. Al- 
though as the reactant numbers increase, the relative size of the jumps 
decreases. c) Repeating an experiment many times, we typically obtain 
some repeatable averaged behavior that conforms very nearly to the deter- 
ministic description and some envelope around the average that accounts 
for the fluctuations in individual trajectories. Inset: We can imagine 
the full evolution of the system as composed of two parts: the determin- 
sitic evolution of [C](t) and a probability distribution for the fluctuations 
that moves along [C](t). The width of the probability distribution scales 
roughly as In where N is the number of molecules. 
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rate equations are simply proportional to the mean of an exponential 
distribution. 

For multidimensional master equations evolving over a discrete state 
space, it is convenient to introduce several new objects: the step-operator 
E, the stoichiometry matrix S and the propensity vector v. 

The step-operator E is short-hand for writing evolution over discrete 
space. The action of the operator is defined in the following way: E? 


increments the i*” variable by an integer k. That is, for a function 
f(mi,n2,...,i,-..) depending upon several variables, 
EX f(n1, N2,+++5,Mi,-- 5) = f(m, N25 -++5 14 + ky os )s (3.25) 


The stoichiometry matrix S describes how much each species changes 
with the completion of a given reaction, while the propensity vector v 
describes the rate at which a particular reaction proceeds. An example 
should make this notation more clear. 


Example: Coupled Poisson processes — To introduce the stoichiom- 
etry matrix and the propensity vector, consider a simple linear two-state 
model: 


monm+1, “4 =a (3.26) 
m—om—1, v2 = By -n1/Q 
no —> no +1, V3 = a2 -n1/Q 


ne —> ne — 1, V4 = Bo -n2/Q. 


All of the transition rates are linear or constant making it possible to solve 
for the probability distribution P(n,,n2,t) exactly, at least in principle 
(see Section 5.2 on page 113). We shall not attempt to do so here, but 
focus instead on how to represent the reaction network in a convenient 
manner. 

Generally, we record the reaction propensities in a vector v and the 
stoichiometries in a matrix S defined such that when the j“” reaction 
occurs it increments the i‘” reactant by an integer Sig 2 14 a4 ny + Si;. 
The collection of the elements S;; compose the stoichiometry matrix. In 
the present example, 
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where each column of the stoichiometry matrix corresponds to a particular 
reaction and each row to a particular reactant. Using this notation, in 
the limit Q > oo with n;/Q = 2; held constant, the deterministic rate 
equations can be written in terms of the stoichiometry matrix and the 
propensity vector, 


lim S- vp, (3.27) 


dx 
= A X, 
dx 
a =m 1 — Bo x2 


The real convenience of describing the reaction network in terms of S and 
vy comes in writing the master equation. Piecing together each term in 
the equation from the various reactions can be tedious, but an explicit 
expression exists — For a system of R reactions and N reactants (i.e., 
n € RY), the master equation is 


N 
(11 es) = 1 v; (n) P (n,t), (3.28) 


where v; is the microscopic reaction propensity (with units of concentra- 
tion per time) explained below. Furthermore, for numerical and analytic 
approximations of the solution of the master equations, most schemes are 
concisely written in terms of S and v. 


aP = 
mip 


j=l 


Brief note about microscopic reaction propensities v 


For nonlinear transition rates, the propensity appearing in the master 
equation is not identical with the propensity in the deterministic rate 
equations (Eq. 3.27). The difference is not difficult to understand. Con- 
sider, for example, the dimerization reaction, 


X+X Xo. 
In a deterministic system, where the number of molecules is very large, 
the rate of accumulation of the product X2 is written, 


d 


7 el = a [XP (3.29) 
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Microscopically, what we mean by a reaction event is that two molecules 
of X find one another with sufficient energy that they form the dimer 
X2. The probability for the reaction to occur is then proportional to the 
number of ways two molecules can collide, 


“ [Xo] x ; (nx) (nx —1), (3.30) 
where n, is the number of molecules of X in a reaction vessel of volume 
Q. The last term on the right-hand side is (nx — 1) because we need at 
least two molecules to have a reaction, and the 1/2 factor comes from 
not double-counting each reactant - one molecule colliding with another 
is the same as the other colliding with the one. Here, and throughout, 
the fraction accounting for different permutations of the reactants will 
be absorbed into the rate of reaction so that we can write, in units of 
concentration per time, 


(2) aoenx nx-l nx P00, 
V = . . 
Q 2 Q9 Q 


microscopic reaction rate 


v ([X]) =a-[X]-[X]. (3.31) 


~ . . 
macroscopic reaction rate 


(with a’/2 = a). In numerical simulation of chemical reaction kinetics 
(Section 4.2.1), the microscopic reaction propensities v are used, while 
in asymptotic solutions of the master equation (Section 5.1), to the or- 
der of approximation that we will be concerned with in this course, the 
macroscopic propensities v are sufficient. 


Suggested References 


The text by Gardiner, 


e Handbook of stochastic methods (3rd Ed.), C. W. Gardiner (Springer, 
2004). 


has a thorough discussion of Markov processes. The text by Gillespie, 
e Markov Processes, D. T. Gillespie (Academic Press, Inc, 1992). 


uses a distinctive approach to Markov processes that some readers may 
enjoy. The mathematics is a bit heavy, but the connection among many 
aspects of Markov processes are united into a single framework. 

The pedagogical article by Kac (pronounced ‘Katz’), 
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e M. Kac (1947) “Random walk and the theory of Brownian motion,” 
The American Mathematical Monthly 54: 369-391, 
is a classic, as is the lengthy review by Chandrasekhar, 


e S. Chandrasekhar (1943) “Stochastic problems in physics and as- 
tronomy,” Review of Modern Physics 15: 1-89. 


Finally, Elf and Ehrenberg present the stoichiometry matrix and propen- 
sity vector in the context of several applications, 


e J. Elf and M. Ehrenberg (2003) “Fast evaluation of fluctuations in 
biochemical networks with the linear noise approximation,” Genome 
Research 13: 2475. 


Exercises 


1. Wiener process: Use the defining features of the Wiener process 
X(t), i.e. the stationary-independent increments and the 1%‘-order 
probability density given by (Eq. 3.6), 


1 x 
x,t = ex t 2 0 ’ 
fest) = ie exw |] (t= 0) 
to show the following: 
(a) The autocovariance is given by, 
((X (t1)X (t2))) = 2D min (ty, ta). 


(b) The increments of the Wiener process are un-correlated on 
disjoint time-intervals, 


((X (ta) _ X (t3)| [X (t2) _ X (t1)]) =0, for any t4 > tz > ta > th. 


2. Ornstein-Uhlenbeck process: In the following, use the defining 
expressions for the Ornstein-Uhlenbeck process X(t) (Eq. 3.8), 


1 2 
fo) = oR? |- Bia) 
1 & 1 (x2 - aye~Fr)" 
/r(D/B) (e287) | (D/B) (— e787) |’ 


(where T = ty — t; > 0). 


p (a2|21,7) = 
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(a) Compute explicitly the autocorrelation function, (X (t,)X (t2)). 
(b) Show that the compatibility condition is satisfied for tg > t, 


/ f (v1, £23 t1, te)dx, = f(x; te). 


(c) Show that the increments of the Ornstein-Uhlenbeck process 
are negatively correlated on disjoint time intervals, 


([X (ta) — X (t3)] [X (t2) — X(t1)]) <0, for any tg > ts > te > th. 
(d) If Z(t) is the integral of the Ornstein-Uhlenbeck process X(t), 


Z(t) = [ xen 


then find, (Z(t,)Z(t2)). 

(e) If Z(t) is the integral of the Ornstein-Uhlenbeck process X(t), 
then find (cos[Z(t1) — Z(t2)]). Hint: It may be convenient to 
use cumulants. 


3. Chapman-Kolmogorov equation: Consider a stochastic process 
X(t) which starts from X(0) = 0, takes continuous values x and is 
homogeneous in time and space (i.e., has stationary independent 
increments), such that fij1 (v2, t2|%1,t1) = f (v2 —%1,t2 — t1) for 
to >t; > 0, where f(a,t) is the first-order probability density. 


(a) Show that the Chapman-Kolmogorov equation, 


f(z, t) i. fle—y,t—T)fly,T)dy, 


is satisfied if the first-order cumulant-generating function has 
the form: In G(k,t) = tg(k), where g(k) is an arbitrary function 
of k. 

Hint: Use the Fourier transform to convert the Chapman- 
Kolmogorov equation into a functional equation for the char- 
acteristic function G(k,t) of the process. 

(b) Assuming that the time evolution of the process is governed 
by the (known) transition probability per time of the form 
w(a'|x) = w(2’ — 2x), apply the Fourier transform to the master 
equation for f(x,t) and show that, 


f(a,t) = ~ i exp ite + if w(2’) (cit - 1) as! dk. 


—oo —oo 
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4. Dichotomic Markov process (Random telegraph process): 
The function Y (¢) is a Markov process with range {—1, 1}, switching 
between the two at a rate 7+. 


(a) 


Construct the master equation for the dichotomic Markov pro- 
cess. 

Solve the master equation with the initial condition y(to) = yo 
to obtain the conditional probability density P(y, t|yo, to). 


Show that this is consistent with the single-point probability 


Ply,t) = $ (dy,-1 + 5y,1), where 6;; is the Kronecker delta. 


Repeat the above for a Markov process Y(t) with a range {a, b} 
and asymmetric transition probabilities: a “> b and b ay a. 
Compute the steady-state autocorrelation function for this pro- 
cess. 


5. Lead-probability: Suppose that in a coin-toss game with unit 
stakes, Albert bets on heads, and Paul bets on tails. The probability 
that Albert leads in 2r out of 2n tosses is called the lead probability 
P2,2n- One can show that, 


(a) 


(b) 


Poson = (*") Gs = *") g-2n 
r n—-T 


For a coin tossed 20 times, what is the most likely number of 
tosses for which Albert is in the lead? Guess first, then make 
a table of P22, for n = 10 and r = 1,2,...,10. 


Show that the probability that + < x is given by, 


2 
f(x) ~ =arcsin Vz, 
7 


as n —> oo. For a game where the coin is tossed every second 
for one year, what is the probability that one of the players 
will be in the lead less than one day out of 365? 


For a one-dimensional random walk, would you expect a given 
realization to spend most of its time on the negative axis, the 
positive axis, or to hover near the origin, spending equal time 
on the positive and negative axes? 


6. Bernoulli’s urn model and the approximation of Laplace: 
The urn model of Bernoulli has nonlinear transition rates, and there- 
fore cannot be solved in general. Laplace, by a clever substitution, 
derived an approximate evolution equation. 
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(a) 


Solve Laplace’s partial differential equation, Eq. 3.21, at steady- 
state. That is, set the left-hand side equal to 0, and solve 
the resulting ordinary differential equation for U(j) assum- 
ing U(p),U’(4) — 0 exponentially as pp > +oo. Comparing 
the solution with the canonical Gaussian probability density 
(cf. Eq. A.19 on page 267), and using the change of variables 
Eq. 3.20, what is the variance of the steady-state distribution 
for the white ball density ¢ = 2/n? What can you say about 
the equilibrium state as n > oo? 


Multiply Eq. 3.21 by yw and integrate from pz € (—00, co) to ob- 
tain an evolution equation for the average (41). (Use the same 
boundary conditions as above: U(,1),U’(j) — 0 exponentially 
as ys — +00.) From the resulting equation, show that (yu) re- 
turns to (44) = 0 exponentially quickly; é.e., deviations from the 
equilibrium state « = $ n are restored exponentially quickly 
(Newton’s law of cooling). Hint: Integrate by parts, assuming 
that the probability distribution and its first derivative both 
vanish as ps + oo. 


Repeat the above, but this time multiply by yu? and integrate 
from ps € (—oo, co) to obtain an evolution equation for the vari- 
ance (17). Show that the variance asymptotically approaches 
a non-zero steady-state. How does this steady-state compare 
with the results obtained in question 6a? 


7. Ehrenfests’ urn model: The urn model proposed by Paul and 
Tatiana Ehrenfest to clarify the idea of irreversibility in thermody- 
namics is the following: 


Imagine 2R balls, numbered consecutively from 1 to 2R, 
distributed in two boxes (I and II), so that at the be- 
ginning there are R+n (—R <n < R) balls in box I. 
We choose a random integer between 1 and 2R (all inte- 
gers are supposed to be equiprobable) and move the ball, 
whose number has been drawn, from the box in which it 
is to the other box. This process is repeated s times and 
we ask for the probability Qrim,s; that after s drawings 
there will be R +m balls in box I. 


Write out the master equation governing the evolution of the 
probability distribution Qr+m,s- 
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(b) 
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Following the method of Laplace discussed in Section 3.6, de- 
fine a suitable continuous fluctuation variable 4 and derive a 
partial differential equation for the evolution of the probability 
distribution for p. 


Multiply the partial differential equation derived in 7b by yz and 
integrate from p € (—oo, co) to obtain an evolution equation 
for the average (41). From the resulting equation, show that 
(4) returns to (4) = 0 exponentially quickly; 7.e., deviations 
from the equilibrium state are restored exponentially quickly 
(Newton’s law of cooling). Hint: Integrate by parts, assuming 
that the probability distribution and its first derivative both 
vanish exponentially as ps > too. 


8. Malthus’s Law: Malthus’s law for population growth assumes that 
the birth and death rates are proportional to the number of indi- 
viduals, respectively, b(n) = 6n and d(n) = an, with a and @ given 
constants. 


(a) 


(b) 


Use the probability generating function, Eq. 1.47, to solve the 
master equation for f(n,t) = P{N(t) =n|N(0) = no} and 
show that, in the case a = £, the probability of extinction 
goes as, 


f0,2) = [1+], 


where no is the initial population. 

Solve the deterministic rate equation for (N(t)) and (N?(t)). 
Describe the time-dependence of the variance o?(t) for the 
cases when 6 >a, B<a,and B=a. 


9. Step-operator (Eq. 3.25): 


(a) 


(b) 


Prove the identity 
do IANJEF(N) = SO SNE 9) (3.32) 
N=0 N=1 


for any pair of functions f,g such that the sum converges. 
Consider the decay of a radioactive isotope 


REA, (3.33) 


where A is inert and does not enter into the equations. 
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i. What is the deterministic equation describing this process? 
ii. Write out the master equation governing the probability 
distribution of X. 
iii, Compute the equation governing the mean (X) by mul- 
tiplying the master equation by X and summing over all 
X. Use Eq. 3.32 to show that the mean satisfies the de- 
terministic rate equation. This is a general characteristic 
of master equations with linear transition rates. 


10. Chemical reaction networks: The time-evolution of species in a 
chemical reaction network is often represented by a Markov process, 
characterized by the master equation. 


(a) Write out the master equation for the simple linear network 
(3.26). Try to do the same without using the explicit formula 
provided. 

(b) A “toggle switch” network consists of two mutually repressing 
species, r; and rg — If r; is high, synthesis of rg is low, and, 
conversely, if rg is high, r; is kept low. A simple network 
describing this system is the following four reactions: 


ry a ryetl, y= a+ gr(re/Q) 


r2 > rg +1, v2 =a gr(ri/Q) 


V3 Ty 
rr 1, ¥3 = B+ 
V4 rs 
rg —>ro—1, m= Bo, 


where the function gr(x) is high if « is low and low if z is high. 
Write out the stoichiometry matrix S and the propensity vector 
vy for the toggle switch. Write out the deterministic reaction 
rate equations and the master equation. 


CHAPTER 4 
BT 
| 


SOLUTION OF THE MASTER EQUATION 


The master equation derived in Chapter 3 provides a foundation for most 
applications of stochastic processes. Although it is more tractable than 
the Chapman-Kolmogorov equation, it is still rare to find an exact solu- 
tion. For linear transition rates, the master equation can be transformed 
to a first-order partial differential equation from which it is sometimes 
possible to extract an exact solution. More often, the transition rates 
are nonlinear, and approximation methods are the only recourse. We 
shall explore two popular and powerful approximation methods: numer- 
ical simulation algorithms (Section 4.2.1) and a perturbation expansion 
called the linear noise approximation (Chapter 5). 


4.1 Exact Solutions 


There are very few general methods for solving the master equation. We 
shall discuss two of these in detail. The first is called the moment gener- 
ating function and is used to transform the master equation into a linear 
partial differential equation. The method only works if the transition 
probabilities are linear in the state variables. As such, it is restricted 
to use on rather artificial and uninteresting examples. Furthermore, if 
the dimensionality of the system is high, the algebra is formidable and 
the auxiliary partial differential equation may not be amenable to solu- 
tion either. The second method we shall discuss relies upon re-writing 
the master equation in vector-matrix notation. The steady-state solution 
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can then be computed as the eigenvector of a given transfer matrix. The 
method is semi-exact since the eigenvector is usually computed by nu- 
merical matrix iteration. Furthermore, the method becomes difficult to 
implement if the dimensionality of the system is large or the range of the 
state space is unbounded. Nevertheless, for one dimensional systems, even 
with nonlinear transition probabilities, the method can be very useful. 


4.1.1 Moment Generating Functions 
The moment generating function Q(z, t), associated with the probability 
P(n,t), is a discrete version of the Laplace transform, 


Co 


Q(z,t) = )0 2"P (n,t), (4.1) 
n=0 
very similar to the z-transform used by electrical engineers!. The mo- 
ment generating function is so-named because the moments of P(n,t) are 
generated by subsequent derivatives of Q(z,t), 


Q(1,t)=1, [Normalization condition on P(n, t)] (4.2) 
meet) 7 = Dene 7 = (n(t)), (4.3) 
eee) eet) = So n(n-1)2*7P(n,t)| = (n? (t)) —(n@),... 
om nd z=1 


(4.4) 


Multiplying both sides of the master equation by z” and summing over all 
n allows the discrete-differential master equation for P(n,t) to be trans- 
formed into a partial differential equation for Q(z,t). A simple example 
should clarify the procedure. 


Example — Poisson process. Consider a simple birth-death process, 
with constant birth rate g, = a and linear death rate r, = 8 x n. Call 
P(n,t) the probability of finding the system in state n at time ¢ (con- 
ditioned, as always, by some initial distribution). The master equation 


1The z-transform is usually defined over an unbounded domain, with z” appearing 


in the denominator, i.e. for the sequence {x,}7° _.,, the z-transform X(z) is defined 


Co 
as X(z) = >) St. See Chapter 3 of G. James “Advanced modern engineering 


n=—oo 


mathematics (3"¢ Ed.),” (Prentice-Hall, 2004), and Section B.4.3 on p. 294. 
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corresponding to this process is, 


ohne) =[aP(n—1,t)+8(n+1)P(n+1,t)] —[6 nP (n,t) + aP (n,t)], 
GAIN vee as 


or, rearranging slightly, 


dP (n,t) 


EOP (n- Lt) Pint) +8 [(n +1) P(n+1,t) — nP(n,0)]. 


(4.6) 


A particularly elegant short-hand, which shall be useful in the following 
section on approximation methods, involves the step-operator E* (recall 
Eq. 3.25 on page 71). The operator acts by finding the i” entry of n and 
incrementing it by an integer k: 


BF Gacgtigets = t Cette es (4.7) 


Using the step-operator, Eq. 4.6 is written, 


eat =a [E;' — 1] P(n,t)+ 6 [Ej] —1] (n P(n,t). (4.8) 
Multiplying by 2”, 
ern =a [E;*z _ 1] 2 P (n,t) + B [Ei oo z] (2°10 P (n,t)) 


(4.9) 


Under the transformation Eq. 4.1, Eq. 4.9 becomes a simple partial 
differential equation for Q(z, t), 


OQ (z, t) 


dQ (z,t) 
ot , 


Oz 


We have transformed a discrete-differential equation (difficult) into a 
linear first-order partial differential equation (easier). The full time- 
dependent solution @Q(z,t) can be determined using what is called the 
method of characteristics. Instead of the full time-dependent distribu- 
tion, we shall focus upon the first two moments in steady-state. Solving 
the transformed Eq. 4.10 at steady-state as = 0, we have, 


=a(z—-1)Q(z,t) —8(z-1) (4.10) 


Q5(z) = exp Fe = | (4.11) 
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The steady-state moments follow immediately, 


with steady-state variance, 


N 
2) 


Zo (4.14) 


Having mean equal to the variance is the footprint of a Poisson process. 
In fact, from the definition of the moment generating function (Eq. 4.1), 
expanding @*(z) as an infinite series, 


Q*(z) = exp Fac = | = exp[-F™| are eae 


we recognize the coefficients of the series as the probability distribution 
of a Poisson process, 


—, eh” 
Pn =e€ ia 

n! 
with pp = A@m/GBm. We can measure how close to Poisson distributed a 
given process is by considering the Fano factor, =. In our case (since 
our process is Poisson distributed), the Fano factor is 1, 


a eer (4.15) 


Alternately, the fractional deviation 7 = Vo is a dimensionless mea- 
sure of the fluctuations and often provides better physical insight than 


the Fano factor. In the example above, 
1 
(n) 


substantiating the rule-of-thumb that relative fluctuations scale roughly 
as the square-root of the number of reactants (cf. Laplace’s solution to 
Bernoulli’s urn problem, Eq. 3.20 on page 66). We shall exploit this 
scaling in Section 5.1, discussing the linear noise approximation. 


n= ; (4.16) 
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The moment generating function method will only work if the tran- 
sition rates are linear in the state variable — in complete analogy with 
the Laplace transform for ordinary differential equations. A (semi)-exact 
method that can be used in the case of nonlinear transition rates is matrix 
iteration; a method we shall discuss in the next section. 


4.1.2 Matrix Iteration 


Discrete master equations derived for systems with low dimensionality 
(i.e. one or two state variables) can often be usefully re-written in vector- 
matrix notation, with the probability of finding the system in a (discrete) 
state n € (0, N), written p,, occupying elements in a 1 x (N + 1) vector 
p, and the transition probabilities occupying the elements of an (N +1) x 
(N +1) matrix W, 


p(t{+1)=W.- p(t). (4.17) 
Notice, also, that the continuous time master equation, 


Op 
7 = oD Wnn'Pn! — Wn'nPn 
n! 


can likewise be expressed in terms of the transition matrix W, 


where 


Although Eq. 4.17 could be generalized to an unbound state space 
n € (—o0, oo), it is clearly most useful for events restricted to a bounded 
domain. The evolution of the probability distribution is defined as an 
iterative matrix multiplication, 


p(t) = W' - p(0). (4.18) 


The steady-state distribution is then the eigenvector corresponding to the 
A = 1 eigenvalue of the matrix W, obtained from multiplying W by itself 
an infinite number of times, 


W, = lim W’. (4.19) 
too 


Approximation Methods 85 


That is, the steady-state probability distribution p, corresponds to the 
unit eigenvector of W,; 


Ws - Ds =Pps. 


A theorem by Perron and Frobenius states that if W is irreducible, then 
there is a unique stationary distribution p,. Furthermore, if the system 
is ‘aperiodic’, that is, systems whose deterministic counterparts have a 
single asymptotically stable fixed point, then once W, is calculated, mul- 
tiplication with any unit vector yields ps. If W is a separable transition 
matrix, 


Oo 
So 


A 
w_| 0 0 
0 


Oo 


different initial conditions converge to different equilibrium distributions 
(Exercise 1). 

For many processes, the transition matrix W does not depend upon 
time. Furthermore, for one-step processes, the transition matrix takes the 
particularly simple tri-diagonal form, 


Di he a0? Ms owes 
Yams A 
Oe Ee <0 


G. O° i a 
0 0 


Ww = 


where it should be understood that the elements Tp, 7 and T_ are them- 
selves functions of the row in which they appear. Matrix iteration provides 
a particularly convenient method to estimate the accuracy of Laplace’s 
approximate solution of the Bernoulli urn problem (see Section 3.6 on 
page 64). 


Example — Bernoulli’s urn model. The following problem was posed 
by Bernoulli (see Figure 3.3 on page 64): 


There are n white balls and n black balls in two urns, and 
each urn contains n balls. The balls are moved cyclically, one 
by one, from one urn to another. What is the probability z,., 
that after r cycles, urn A will contain x white balls? 
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A) B) Cc) 


n=10 n=50 n=100 
P; P; P; 


0.2 04 06 08, ,1 0.2 04 06 08 


y, y, 


0.2 404 06 08 


gl 
In 


Figure 4.1: Steady-state distribution for the fraction of white 
balls in one urn of Bernoulli’s urn model. The eigenvector of the 
repeated multiplication of the transition matrix is shown as filled circles, 
with Laplace’s approximation (cf. Eq. 3.23 on page 66) shown as a solid 
line, for increasing numbers of balls. A) Number of balls in one urn, 
n= 10. B) n= 50. C) n= 100. As the size of the transition matrix 
becomes larger (n — oo), repeated multiplication becomes computation- 
ally demanding; however, as n — oo, Laplace’s approximation becomes 
indistinguishable from the ‘exact’ numerical solution. 


In attempting to solve this problem, Laplace derived the following differ- 
ence equation for the evolution of z,,,, 


i\e =X? 
4e,rt1 = (= ) Zotar +2 — (1- “) Zar + (1- — ) Ze—1,r+ 


Assuming the number of balls is large (n — oo) so that changes in the den- 
sity x/n is nearly continuous, along with the ansatz that x = n/2+ p/n, 
Laplace arrived at a partial differential equation governing the probability 
distribution for u, U(u,1r), 


OU (p,r') 0 ; tO? of : 
re 2 di (uwU (u, r°)) + 2 ae \2 U(u,1") } . (4.21) 


The steady-state solution for the density @ = x/n is the Gaussian, cen- 
tered at @ = 1/2 with variance 1/8n, 


Aar=e (2 = An (: s) (4.22) 


Re-writing Laplace’s difference equation in vector-matrix notation allows 
us to use matrix iteration to estimate the error in Laplace’s approxima- 
tion. 
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Writing the probability z,,, as a vector z(r) € R("+)), where z;(r) = 
2-1,r, the transition rates appearing in the i” row are given by, 


ij 1 i—1 at po2ONe 
Wii = 2° (1 =e ) ; Witt = (=) F Wii-a — (1 ae ) : 
n nm n n 
(4.23) 


For example, with n = 4 balls in each urn, W is the 5 x 5 matrix, 


& 

l| 
— oo) 
SO Blew igH 
COBRIFRVIFAIF © 
Blais |o O O 
oreo o 


The steady-state transition matrix W, is given by, 


dh it eos Sat “al 
16 16 16 16 16 
W,=-= | 36 36 36 36 36 |, 
16 16 16 16 16 
Tt, dé Of Sa) Al 


so that the steady-state probability distribution is z= Z5[1 16 36 16 1]. 
Continuing in the same fashion for larger n, we see that Laplace’s ap- 
proximation of the distribution is quite accurate even down to n = 10 
(Figure 4.1). 


4.2 Approximation Methods 


There are two broad classes of approximation methods - numerical simu- 
lation algorithms and perturbation methods. Each has clear advantages 
and disadvantages. 


1. Numerical Simulation: The classic reference for these types of algo- 
rithms is the paper by Gillespie: D. T. Gillespie (1977) “Exact sim- 
ulation of coupled chemical reactions,” Journal of Chemical Physics 
81: 2340. 


The method simulates a single trajectory n(t) that comes from the 
unknown probability distribution P(n,t) characterized by the mas- 
ter equation. 
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2. Perturbation Methods: The classic reference for the perturbation 
scheme we will consider is the paper by van Kampen: N. G. van 
Kampen (1976) “Expansion of the Master equation,” Advances in 
Chemical Physics 34: 245. 


For perturbation methods, the discrete jump in n(t) that occurs 
with each reaction is treated as a nearly deterministic process. The 
‘nearly’ is what we use as a perturbation parameter. 


The Brusselator 
e S. Strogatz, Nonlinear dynamics and chaos: With applications to physics, bi- 


ology, chemistry and Engineering (Perseus Books Group, 2001). 


We shall develop both the stochastic simulation algorithm and the linear 
noise approximation in the context of a specific example — the Brusselator. 
The Brusselator is a often-used pedagogical example of a system exhibit- 
ing limit cycle behaviour over a range of parameter values. The model is 
a two-species chemical network described by the following reactions, 


0 X, 
OX Ke 4 BGs 
ee oe 
X1 S 0. 


Without loss of generality, the time and volume are scaled so that y = 
6 = 1 to give the deterministic rate equations, 
dX, 


ae =1+aX7X2—(b+1)X, (4.24) 


= —aX?X2+bXi, 


and consequently, the re-scaled master equation governing the probability 
density P(n 1, no, t), 


aP - 7 
Sp = St (Er = 1) P+ G5 (BrE2 — 1) mi (ma — 1) naP+ 
+ (Et — 1) n.P+6(E}B;! — 1) mP. (4.25) 


In this form, the deterministic rate equations admit a single stable equilib- 
rium point (X1, X2) = (1, 2) for all choices of parameters with b < 1+a. 
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@) Limit cycle b) c) 
About the 
xX, xX, 
b | Fixed Point E ee . este 
b>l+a 
Fixed Point is 
Stable 
1 b<l+a 
a x x 


Figure 4.2: The stability of the Brusselator. a) With suitably chosen 
units, there are two parameters in the Brusselator model. The stability of 
the trajectory in phase space is determined by the relationship between a 
and 6 in parameter space. b) For b > 1+ a, a trajectory will move away 
from the fixed point, and orbit on a closed limit cycle. c) For b< 1+a, 
a trajectory will move toward the stable fixed point. 


Along the critical line b = 1+ a, the system undergoes a Hopf bifurca- 
tion, and the steady-state is a stable limit cycle (Figure 4.2). Notice that 
the transition rates are nonlinear in n — as a consequence, there is no 
general solution method available to determine P(n,t). Nevertheless, it 
is precisely the nonlinearity of the transition rates that makes this model 
interesting, and therefore the Brusselator is a terrific example to showcase 
numerical and analytic approximation methods. 

We shall express both the stochastic simulation algorithm of Gille- 
spie and the linear noise approximation of van Kampen in terms of the 
propensity vector v and the stoichiometry matrix S introduced in Chap- 
ter 3 (Section 3.7 on page 71). From the reaction network shown above, 
with y = 6 = 1, the propensity vector and stoichiometry matrix are given 
by, 


1 
as ni-(n1—1)-n2 
y= ag (4.26) 
Q 
na 
Q 
and, 
VY, Yq V3 VY 
ge) et a ge (4.27) 
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4.2.1 Numerical Methods — Gillespie’s Algorithm 


For master equations with nonlinear transition probabilities, the full dis- 
tribution P(n,t) can rarely be solved exactly. Gillespie’s algorithm is a 
method by which an individual sample path, starting at a given initial 
point, can be simulated in time such that it conforms to the unknown 
probability distribution we seek; that is, for a sufficiently large popula- 
tion of sample paths, the inferred probability distribution is as near to 
the exact solution as we wish (analogous with Langevin’s modeling of 
Brownian motion). The algorithm proceeds in 3 steps: 


1. The propensities v; are used to generate a probability distribution 
for the next reaction time, T and 7 is drawn from this distribution. 


2. The propensities are used to generate a probability distribution for 
which reaction in the network will occur nest, i.e. which of the v;’s 
is completed at time t+ 7. Call the reaction index p. 


3. The time is advanced t — t+ 7 and the state is updated using the 
stoichiometry matrix - for each reactant, n; > nj + Si,. Repeat ... 


In this way, we generate a discrete time series for the reactant numbers. 
We shall go over each of these steps in greater detail (see Figure 4.3), and 
it will be clear that in developing his algorithm, Gillespie built upon some 
fundamental properties of stochastic processes. It is also instructive to see 
Gillespie’s algorithm in action, so we generate some stochastic simulations 
of the Brusselator model introduced on page 88. 


Details of the stochastic simulation algorithm (Figure 4.3) 


e D. T. Gillespie (1977) “Exact simulation of coupled chemical reactions,” Journal 
of Chemical Physics 81: 2340. 


e Markov Processes, D. T. Gillespie (Academic Press, Inc, 1992). 


To advance the state with each reaction event, we need two random 
variables — the time for the completion of the next reaction 7, and the 
index of the reaction that fires yz. It is Gillespie’s deep insight that allows 
us to determine the probability distribution for each of these, and in 
fact how to generate the pair (7, 4) using a unit uniform random number 
generator. 

Ultimately, we are after the conditional probability distribution, 
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1. Initialize: t — tn — n). 


| 


2. Pick t according to the density function@® 
p,(t|n,1t) =a(n)exp[-a(n)c]. 


| 


3.Pick according to the density function? 
v,,(n) 


a(n) 


p,(An, |n,t) = 


t 


4.Advance the process: 


| 


5. Record as required for sampling or plotting. 


n=] er fort-—tT<?t' <t, 


n, for t’ =t; 


If the process is to continue, then return to 2; 
otherwise, stop. 


Figure 4.3: Gillespie’s algorithm for stochastic simulation of the 
master equation. a) Use the inversion method for generating the 
random number 7 (see Section A.4). If a(n,t) = a(n), then the in- 
version is easy: Draw a unit uniform random number rj, and take 
7 = [1/a(n)]In(1/r1). b) Draw a unit uniform random number r2 and 
take ys to be the smallest integer for which the sum over v;(n)/a(n) from 
j =1to j =p exceeds rp. Notice the jump in the state An,, = S,,, where 
S,, is the “” column of the stoichiometry matrix. c) In a simulation run 
containing ~ 10% jumps, the sum t+ 7 should be computed with at least 
K +3 digits of precision. Taken from Gillespie (1992), p. 331. 
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p(n+ An,t+7\|n,t) dr: the probability that, given the sys- 
tem is in state n at time t, the next jump occurs between t+7 
and t+ 7 +d7, carrying the state from n to n+ An, 


from which we draw the random numbers (7, An) to advance the system. 
In practice, because the stoichiometry matrix S records how the state 
advances with each reaction event, it is sufficient that we simply generate 
the random variable specifying which reaction has fired, (7, ju). 

Following Gillespie, we introduce the probability q(n,t;7) that the 
system in state n(t) will jump at some instant between t and t+7. From 
the microscopic transition rates v(m), we know that over an infinitesimal 
interval dt, 


q(n, t; dt) = 
[ Prob. Reaction 1 Occurs + Prob. Reaction 2 Occurs +. ... | 
N 
= [1 (n)dt + v2(n)dt...] = | S>v; (n)| dt = a(n) dt (4.28) 
j=l 


Over an infinitesimal interval, at most one jump can occur, so we have 
that the probability that no jump occurs, q* (n,t;dr), is q* (n,t;dr) = 
1—q(n,t;dr). Over a non-infinitesimal interval, the probability that no 
jump occurs is, 


q* (n,t;7) = exp [—a(n)r], (4.29) 


(Exercise 6). In that way, the probability p(n + An,¢ +7|n, t) dr is writ- 
ten, 


p(n+An,t+7|n,t) = 


q* (n, t; 7) x a(n) dr x w(An|n,t+7) 
—S_—’ —S ~~ m@———>_—->_—_—_—_’ 
Probability the state will Probability the Probability that, given 
NOT jump during [t,t + 7] state WILL jump the state jumps at t + 7, 
in [t + 7,t + 7 + dr] it will land inn + An 


The first two terms on the right-hand side determine the next reaction 

time T, while the last term determines the next reaction index. Therefore, 

we can factor the conditional probability into two parts p;(7) and p2(An), 
p(n+An,t+7|n,t) = a(n) exp [—a(n)7] dr x w (An|n,t +7). 
a 


pi(t|n,t) p2(An|n,t+7) 
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Probability pi(7) is exponentially distributed, consequently a unit uni- 
form random number r; can be used to simulate 7 via the inversion (see 
Section A.4), 


T= — In(1/r1). (4.30) 


The index of the next reaction ps is a little more subtle — Notice from 
Eq. 4.28 that the probability of reaction yz to occur is proportional to 
the rate of reaction v,. The normalization condition ensures that the 
probability it is reaction 4 that has caused the state to jump is, 


_ y(n) _ y(n) 
p2(An,) = Soe aar (4.31) 


where An,, is the yu” column of the stoichiometry matrix S. The next 
reaction index p is simulated using a unit uniform random number re via 
the integer inversion method (see Exercise 4b on p. 279). That is, the 
index y drawn from pg is the first integer for which 


= a S v;(n) > re. (4.32) 


With (7, 1), the system is updated, 


to t+r, 
ng > 44+ Sip, 


and the algorithm is repeated as long as desired, each time drawing two 
random numbers (11,72) from a unit uniform random number generator. 
An implementation of Gillespie’s algorithm coded in Matlab is annotated 
in the Appendix (see Section D.1.2 on p. 306). 


Stochastic simulation of the Brusselator model 


We shall produce stochastic simulations of the Brusselator model in both 
the stable and limit-cycle regimes of the model parameter space (see Sec- 
tion D.1.3 on p. 308 for example Matlab code). We shall find that both 
regimes are equally accessible to the stochastic simulation algorithm. In 
contrast, the linear noise approximation described in the next section re- 
quires a major modification to treat fluctuations around the limit cycle. 
On the other hand, it is very difficult to describe the simulation results 
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Figure 4.4: Gillespie’s stochastic simulation algorithm - The Brus- 
selator in the stable regime. a) (a,b) = (0.1,0.2), (a = 10 in Figure 
23 of Gillespie (1977)). The system is very stable. b) (a,b) = (0.5, 1), 
(a = 2 in Figure 22 of Gillespie (1977)). The system is still stable, but 
fluctuations are more appreciable. 


any more than qualitatively, while the perturbation methods provide a 
connection between system behaviour and model parameters. 

Stable regime, b < 1+ <a: In the stable regime of the model parame- 
ter space, the system admits a single, stable equilibrium point - In Figure 
4.4, that equilibrium is (#1, x22) = (1000, 2000). The number of reactant 
molecules is large, so the intrinsic fluctuations are correspondingly small. 
As the parameters get closer to the Hopf bifurcation, the fluctuations be- 
come somewhat larger (Figure 4.4b). Neither plot illustrates particularly 
interesting behaviour. 

Limit cycle, b>1+ a: At the bifurcation (Figure 4.5), the system 
parameters are on the threshold of stability and the fluctuations carry 
the state on long excursions away from the fixed point. From the time- 
series plot (Figure 4.5a), it appears as if the fluctuations are generating 
nearly regular oscillations (see Excercise 5 on p. 214). In phase-space 
(Figure 4.5b), the system seems confined to an elongated ellipse with a 
negatively-sloped major axis. 

Beyond the Hopf bifurcation, the system exhibits regular limit cycle 
oscillations (Figure 4.6). As Gillespie notes in his original article, the 
horizontal leg of the cycle seems to travel along grooves of negative slope, 
rather than straight from right to left. There is some spread along the 
diagonal leg, but both the horizontal and vertical legs are little influenced 
by the fluctuations (for analytic insight into why that is, see Section 11.2). 
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Figure 4.5: Gillespie’s stochastic simulation algorithm - The Brus- 
selator at the Hopf bifurcation. a) Simulation of the system at the 
Hopf bifurcation, (a,b) = (1,2) (a = 1 in Figure 21 of Gillespie (1977)). 
The fluctuations generate what appear to be almost regular oscillations. 
b) In phase-space, the fluctuations are confined to a large ellipse, an- 
gled with negative slope indicating the strong cross-correlation between 
fluctuations in 7; and 29. 
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Figure 4.6: Gillespie’s stochastic simulation algorithm - The Brus- 
selator in the unstable regime. a) (a,b) = (5,10) (a = 0.2 in Figure 
19 of Gillespie (1977)). b) (a,b) = (10,20) (a = 0.1 in Figure 18 of 
Gillespie (1977)). 


The advantages of the stochastic simulation algorithm is that it is sim- 
ple to program and provides an output trajectory that exactly conforms 
to the solution distribution of the master equation. The disadvantages are 
that the original algorithm is computationally expensive and the method 
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does not scale well as the number of molecules gets large (although there 
are approximate algorithms that alleviate some of the computational bur- 
den). Most importantly, the method suffers from the same limitations as 
any numerical scheme — there is a lack of deep insight into the model 
and it is difficult to systematically explore different regions of parameter 
space. Nevertheless, Gillespie’s algorithm is the benchmark against which 
all other methods of solving the master equation are measured. 


Suggested References 


For numerical simulation methods, the textbook by Gillespie is unsur- 
passed, along with his seminal article on the stochastic simulation algo- 
rithm, 


e D. T. Gillespie, Markov processes: An introduction for physical sci- 
entists (Academic Press, 1992). 


e D. T. Gillespie (1977) “Exact simulation of coupled chemical reac- 
tions,” Journal of Chemical Physics 81: 2340. 


Exercises 


1. Separable transition matrices: Write out the most general 2 x 2 
transition matrix W. Show that for this case, any initial con- 
dition p(0) converges to a well-defined steady-state distribution 
p®, i.e. Jim. W! - p(0) = p®, with two exceptions. 


2. Time-dependent reaction rates: A chemical reaction is de- 
scribed by the following deterministic rate equation, 


a0 = KOA), 


where k(t) is a time-dependent reaction rate. Solve the associated 
chemical master equation using a moment generating function. 


3. Bursty Poisson model: Consider the following generalization of 
the Poisson model: synthesis occurs with a constant rate a, but 
in ‘bursts’ of size b, and degradation is linear, with rate 6n. The 
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master equation corresponding to this process is, 


dP (n,t) 
dt 
a[P (n— 6,t) — P(n,t)}+ 6 [(n+1) P(n+1,t) —nP(n,t)]. 


(a) For b = 1, solve the characteristic function Q(z, t) for all time. 
What do you notice about the distribution if np = 0? 


(b) Repeat 3a, but for arbitrary burst size. 


4. Bernoulli’s urn model: In describing Bernoulli’s urn model, Laplace 
derived a difference equation with nonlinear transition rates (Eq. 4.20). 
The nonlinearity of the transition rates make the equation difficult 
to solve even at steady-state. 


(a) Using matrix iteration for the cases n = 2,3, and 4, multiply 
the steady-state probability distribution z by its first element 
z, to get a vector of integers. Can you spot a pattern in the 
individual entries? Postulate a general solution for the steady- 
state probability distribution and verify that it satisfies the 
difference equation. 


(b) Using Laplace’s approximate solution (Eq. 4.22), calculate the 
mean-squared error between the exact solution and the contin- 
uous approximation as a function of the number of balls. 


5. Asymmetric cell division: Most bacteria divide with surprising 
symmetry — E. coli for example, typically divides into daughter cells 
that differ in length by less than 5%. Suppose a bacterium divided 
unequally, what would the age distribution (i.e., the time to next 
division) look like for a population? 


(a) Divide the lifetime of the cell into time-steps At — Assume the 
larger daughter lives 10At before division, while the smaller 
daughter lives 15At. Scale the time so that the transition rates 
are 1. Write out the transition matrix W. 


(b) Find W*. What does the equilibrium lifetime distribution look 
like? 

(c) Repeat the above, but on a finer scale. That is, assume the 
large daughter lives 100At and the small daughter lives 150At. 


(d) Derive a deterministic model for the process. How do the re- 
sults above compare? 
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(e) Suppose the size after division was itself a stochastic process. 
How would the transition matrix change? 


6. Gillespie’s simulation algorithm: Gillespie draws upon two very 
important ideas in stochastic processes — the evolution of probability 
for a Markov process, and the simulation of a random variable by a 
unit uniform random number. 


a) Making use of the Markov property, show that for time in- 
g 
dependent reaction events v (n,t) = y(n), integration of the 
probability q* (n, t; dt) = 1 — q(n,t; dt) gives 


(n, t; T) = exp [—a(n)r] ’ 


as quoted in the main text (Eq. 4.29). 
(b) Repeat part 6a for time dependent reaction events v (n,t). 


(c) Write a stochastic simulation algorithm to generate realizations 
of the stochastic process that describes the Brusselator in a 
growing cell. That is, repeat the example in the text, but with 
Q a time-dependent quantity. In bacterial cell growth, the 
volume grows approximately exponentially over a cell-cycle, 
then divides more or less symmetrically. 


i. As a first approximation, assume perfect division of the 
cell volume and perfect partitioning of the cell contents 
into daughter cells. 

ii. Code a routine to allow both the volume after division and 
the partitioned contents to be narrowly-peaked random 
variables. What distribution will you choose for these two 
variables? 


7. Exploring stochastic dynamics: Stochastic models exhibit fea- 
tures that do not appear in their deterministic counter-parts. Some 
of these features are straight-forward, others are quite surprising. 


(a) Stable system. Consider a very simple system with constant 
synthesis and linear degradation, 


Za 
e—>ex+l1, m=a 
uv 
e—>e—-l1, m= f-s. 


Starting with «(0) = 0, compute the deterministic trajectory 
for x(t). Generate stochastic simulation data for molecule 
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numbers from about 10 — 10°, and plot these runs normalized 
to the same steady-state value. How does the relative magni- 
tude of the fluctuations scale with the number of molecules? 
Plot a histogram of the fluctuations around the steady-state — 
what distribution does it resemble? How does the half-width 
scale with the number of molecules? 


Multi-stable system. A “toggle switch” network consists of two 
mutually repressing species, r; and rg — If r; is high, synthe- 
sis of rg is low, and, conversely, if rg is high, r; is kept low. 
A simple network describing this system is the following four 
reactions: 


mont, wu=a- gr(r2/Q) 


res rot], w=a- gra(ri/Q) 


V3 Ty 
Ty > 7r1-1, Va Pe 
V4 rT? 
te —> ta 1, =P oO, 


where the function gr(x) is high if x is low and low if x is high. 
Suppose gr(x) takes the simple Hill-form, 


AG), 
gR(x) = tae) 


where f is the capacity, Kp measures the repressor strength 
(smaller the Kr, the less repressor necessary to reduce gr), 
and n is the cooperativity determining how abrupt the tran- 
sition is from the high to low state. Nondimensionalize the 
deterministic rate equations corresponding to this system, and 
estimate a range of parameters for which the system exhibits 
bistability. Perform stochastic simulations of the model for pa- 
rameters in the bistable regime, along with varying system size 
Q. What differences do you see comparing the stochastic and 
deterministic models? 

Noise-induced oscillations. Read J. M. G. Vilar, H. Y. Kueh, 
N. Barkai, and S. Leibler (2002) Mechanisms of noise-resistance 
in genetic oscillators, Proceedings of the National Academy of 
Sciences USA 99: 15988-15992. Repeat the simulation of their 
model for the parameter choice leading to a deterministically 
stable system. 


CHAPTER 5 —_————— 
oe EXPANSION OF THE 
MASTER EQUATION 


The master equation derived in Chapter 3 provides a foundation for most 
applications of stochastic processes to physical phenomena. Although it 
is more tractable than the Chapman-Kolmogorov equation, it is still rare 
to find an exact solution. One possibility is to adopt the Fokker-Planck 
equation as an approximate evolution equation, as in Chapter 6. We shall 
show in this Chapter that this is the first-step in a systematic analytic 
approximation scheme. 


5.1 Linear Noise Approximation (LNA) 


e N. G. van Kampen (1976) “Expansion of the Master equation,” Advances in 
Chemical Physics 34: 245. 


Often we can gain a better sense of a particular model by examining 
certain limiting regimes. The approximation method that we describe in 
this section examines system behavior in the limit of large numbers of 
reactant molecules. 

We have already seen that as the number of molecules increases, the 
system evolution becomes more smooth and the deterministic formulation 
becomes more appropriate (Figure 3.5). The linear noise approximation 
exploits this behavior and rests upon the supposition that the determin- 
istic evolution of the reactant concentrations, call them x, can be mean- 
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a) b) ss 
II(@,t) ae 
x(t) Large Volume 


ee? | 


Figure 5.1: The Linear Noise Approximation. a) The microscopic 
fluctuations are separated from the macroscopic evolution of the system 
by re-writing the probability density for the whole state P(n, t) as a distri- 
bution for the fluctuations II(a,¢) centered on the macroscopic trajectory 
x(t). b) The discrete state space is smeared into a continuum by replacing 
the discrete step-operator E by a continuous differential operator. 


ingfully separated from the fluctuations, call them a, and that these fluc- 
tuations scale roughly as the square-root of the number of molecules. We 
introduce an extensive parameter 2 that carries the units of volume and 
is directly proportional to the molecule numbers, allowing the molecule 
numbers to be written 


We are lead to the square-root scaling of the fluctuations by the sug- 
gestion from Poisson statistics (see Eq. 4.16 on page 83). Recall that 
for a Poisson process, the fractional deviation is inversely proportional to 
the square-root of the number of molecules (see Eq. 4.16). The picture 
that underlies Eq. 5.1 is that of a deterministic, reproducible trajectory 
surrounded by a cloud of fluctuations. We would like a set of equations 
that govern the change in the deterministic part x and an equation that 
governs the change in the probability distribution of the fluctuations, call 
it II(a@,t), centered upon x (Figure 5.1a). 

With the propensity vector v and the stoichiometry matrix S, we are 
in a position to write the master equation in a compact, and convenient 
manner: for a network of R reactions involving N species the master 
equation is, 


N 
(11 zi = 1 v; (n, ©) P (n,t), (5.2) 


i=l 
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(where we have used the step-operator E* defined in Eq. 4.7). We have 
repeatedly emphasized that if the transition probabilities v; are nonlinear 
functions, then there is no systematic way to obtain an exact solution of 
the master equation, and we must resort to approximation methods. The 
linear noise approximation, which is the subject of this section, proceeds 
in three steps. 


1. First, we replace the full probability distribution P(n,t) by the 
probability distribution for the fluctuations II(a, t) centered on the 
macroscopic trajectory x, 


P(n,t) 4 Q7 7 1(a,t). (5.3) 


The pre-factor Q7 = comes from the normalization of the probability 
distribution. 


2. Recall that what makes the master equation difficult to solve exactly 
is the discrete evolution over state-space characterized by the step- 
operator E¥. To make headway, we must find some continuous 
representation of the action of the operator. To that end, consider 
the action of the operator - it increments the i*” species by an integer 
k. Using the assumption above (Eq. 5.1), we write, 


EF f(..., n,... =f(..., mw +h,...) 
=f Ma tVO (a+ Ze)... (5.4) 
k 


The term AiG becomes negligibly small as Q — oo, suggesting a 


tees hs 
Taylor series around en 0, 


fas ORO (a+ 7a). 


Heese ive AOE MeO 
meee 9, 804 © 22.802 


bel (5.5) 


allowing us to approximate the discrete step-operator by a contin- 
uous differential operator (Figure 5.1b), 
k O ae 


E* x |1 t... 
[t+ Sat sae 


(5.6) 


(Compare this with the Kramers-Moyal expansion, Eq. 6.11 on 
page 126). 
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3. Finally, to remain consistent in our perturbation scheme, we must 
likewise expand the propensities in the limit Q > o, 


15 (4) © i (x) eo eee (5.7) 


where ; (x) are the macroscopic propensities defined in the limit 
that the molecule numbers go to infinity (Q — oo), while the con- 
centration remains fixed (n/Q constant) (see p. 72), 


v; (x) = lim p,; (=) : (5.8) 


It is the expansion of the propensities that distinguishes the lin- 
ear noise approximation from the Kramers-Moyal expansion. The 
result is a consistent approximation scheme with a particularly sim- 
ple Fokker-Planck equation governing the probability distribution 
of the fluctuations, as we show below. 


Putting all of this together, taking care to write an using the chain rule?, 


(5.9) 


we collect Eq. 5.2 in like powers of VQ taking the limit Q > oo. To 
zero’th order (9°), we have, 
dx; OU oll 
0: : =/[S-v|, —. 
dt 0a; | v); Oa; 
This system of equations is identically satisfied if x obeys the deterministic 
rate equations, 


(5.10) 


dx 4 
dt 


= [S-v], = fi(x). (5.11) 


-1 
At the next order, JQ, we have the equation characterizing the prob- 
ability distribution for the fluctuations, 


ot OI! 1 
VQ: ae dP ig9i(ajT) + 5 Ss Dj ;;1, (5.12) 
4,5 i,j 


1There is a missing step here. The chain rule will give an x pression involving the 


time derivatives of a. To obtain the expression involving @ note that the time 
oe — _1 da 
~ Ya dt’ 


dt y 
derivative of P is taken with n fixed; t.e., — 
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where 0; = ae and, 


Of 


Set D =S-diag[v]-S7. (5.13) 
Ox; x(t) 


Vij(t) 

We now have in hand a system of ordinary differential equations that 

govern the deterministic evolution of the system, which happen to coincide 

with the macroscopic reaction rate equations. We also have a partial 

differential equation that characterizes the probability distribution of the 
fluctuations. Some comments are in order: 


1. The equation describing I(a,t), Eq. 5.12, is a special sub-class of 
Fokker-Planck equations since the coefficient matrices T and D are 
independent of a. You can prove (see Excercise 1) that for linear 
drift and constant diffusion coefficient matrices, the solution dis- 
tribution is Gaussian for all time. Furthermore, the moments of 
a are easily computed from Eq. 5.12 by multiplying with a; and 
integrating by parts to give, 


d(a) 
—=T.. 5.14 
ow? Tr (a) (5.14) 
If we choose the initial condition of x to coincide with the initial 
state of the system no (i.e. (a(0)) = 0), then (a) = 0 for all 
time. Without loss of generality, then, we set (a) = 0. For the 
covariance E,; = (aj;a;) — (a;)(a;) = (a;a;), multiplication of the 
Fokker-Planck equation, Eq. 5.12, by a;a; and integration by parts 
gives, 
d= ae ees ee 
gq bere r +D. (5.15) 


The covariance determines the width of I(a,t) as it moves along 
x(t). The full distribution satisfying Eq. 5.12 is the Gaussian, 


TI(a, t) = [(2n)% dete(t)]'/” exp —jaT E(t) -0 , (5.16) 


with covariance matrix E(t) determined by Eq. 5.15. 


2. We have used 2 as an ordering parameter in our perturbation ex- 
pansion of the master equation. Notice, however, that it is not 
the single parameter 2 that determines the reliability of the ex- 
pansion, particularly in higher-dimensional systems. The condition 
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that must be satisfied is that the standard deviation of the fluctua- 
tions in small compared with the mean. After non-dimensionalizing 
the model equations, a more convenient measure is the requirement 
that the elements of the diffusion coefficient matrix D be small. 


3. The Fokker-Planck equation provides an even deeper insight into the 
physics of the process. Notice F is simply the Jacobian of the deter- 
ministic system, evaluated pointwise along the macroscopic trajec- 
tory. As such, it represents the local damping or dissipation of the 
fluctuations. The diffusion coefficient matrix D tells us how much 
the microscopic system is changing at each point along the trajec- 
tory. As such, it represents the local fluctuations. The balance of 
these two competing effects - dissipation and fluctuation - occurs at 
steady-state and is described by the fluctuation-dissipation relation 
(Eq. 5.15 with & = 0), 


l,-2,+8,-T7+D,=0 (5.17) 


where each of the matrices is evaluated at a stable equilibrium point 
of the deterministic system. Compare Eq. 5.12 with Kramer’s equa- 
tion for a Brownian particle trapped in a potential well (Section 6.7 
on page 143). You will see that T plays the role of the curvature of 
the potential (the spring constant), and D plays the role of temper- 
ature. 


4. It is straightforward to show that the autocorrelation of the fluctu- 
ations about the steady-state are exponentially distributed (Exer- 
cise 3), 


(a(t) -a”(0)) = exp [T'.t] - Es. (5.18) 


5. Notice the role stoichiometry plays in the magnitude of the fluctua- 
tions through the coefficient matrix D = S-diag{v]-S7. Although T 
is unchanged by lumping together the propensity and stoichiometry, 
the fluctuation matrix D is not! (See Excercise 2.) 


5.1.1 Example — Nonlinear synthesis 


Consider a one-state model with a general synthesis rate, g[x], but linear 
degradation, 


n—>n+b, “m= 


g 
nn-1, m=6- 
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where n is the number of molecules and 2 is the volume of the reaction 
vessel (so that X = n/Q is the concentration). The master equation for 
this process is, 


dP(n,t) 
dt 


Substituting the ansatz (Eq. 5.1), along with preceding approxima- 
tions (Eqs. 5.3, 5.6 and 5.8), into the master equation (Eq. 5.19), 


1 fd Ou dx 
=a { dt Es a} = 


af ao + la a} {te ~ ailtela} Sgllla.t)+ 


ae + x0 xa} {Qu + VOat Salllosd) 


Cancelling a common factor of 1/ VQ from both sides, and collecting terms 
in like powers of 1//Q, to leading-order, 


dx OI _ 

dt 0a 

This equation is satisfied if the deterministic trajectory x(t) obeys the 
deterministic rate equation, 

dx 

dt 

Although the approximation is called the linear noise approximation, the 


full nonlinearity in the deterministic trajectory is retained. To first-order 
in 1//Q, the fluctuations II(a,t) obey a linear Fokker-Planck equation, 


=O{E—1}9[-]- Pint) +8 {E'—1}mP(n,2). (6.19) 


— {b- gla] — Bay SE 


=b-g|r]—8-z. (5.20) 


on A(all) . {?-g[z] + B-x} &T 
7 d Oa 2 Oa? ° 
The solution to this equation is straightforward — it is simply a Gaussian 
centered on the trajectory x(t) determined by Eq. 5.20, with covariance 
given by, 


(5.21) 


The two equations, Eq. 5.20 and 5.21, provide an approximation of 
the full time-dependent evolution of the probability distribution P(n, t) 


Linear Noise Approximation 107 


obeying the master equation (Eq. 5.19). The series neglects terms of 

order 1/Q and higher. If these terms are retained, then the coefficients 

in Eq. 5.21 become nonlinear in a, and higher-order partial derivatives 

appear,?.e. the distribution II(a, t) is no longer Gaussian (see Exercise 5). 
At steady-state, the mean x* satisfies the algebraic condition, 


and the variance is given by, 


(b+1) Bat 


e)) = a 5 eos (5.22) 


What is the system size? 


In the derivation of the linear noise approximation, the parameter (2 is 
used to order the terms in the perturbation expansion. The validity of 
the expansion depends upon the magnitude of 2 only indirectly; what is 
required is that Qx >> /Qa. More informally, it is essential to find an 
extensive parameter that quantifies the relative magnitude of the jump 
size in the reaction numbers (and take the limit that this jump magnitude 
becomes small). What precisely that parameter is will depend upon the 
problem at hand — for many electrical applications it is proportional to the 
capacitance in the circuit, for many chemistry problems it is proportional 
to the volume. 

For example, consider a model of an autoactivator at the level of pro- 
tein (after time-averaging the promoter and mRNA kinetics), and further 
suppose the synthesis of the activator is expressed as a Hill function so 
that the deterministic rate equation governing the concentration of A is, 

of = Ay 9[A/Kal— A 
t 
where for convenience time is scaled relative to the degradation rate of 
A, and Ag is the maximal synthesis rate of A. This model is a specific 
example of the nonlinear synthesis model described in detail above. The 
function g[A/K 4] depends upon the dissociation constant AK, (in units 
of concentration). To be specific, let 


gla] = 2? /(1 +2). 


The relevant concentration scale in the deterministic model is the disso- 
ciation constant K4, and the unitless parameter that fully characterizes 
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the behaviour of the deterministic model is the ratio Ap/K4. Physically, 
the dissociation constant K4 determines the threshold beyond which a 
bistable positive feedback loop switches from the low state to the high 
state (or vice-versa). In an electronic feedback loop with a microphone 
pointed at a speaker, K'4 would correspond to the sensitivity of the mi- 
crophone to noise from the speaker, determining how loud a disturbance 
from the speaker must be in order to initiate a feedback whine. 

For the linear noise approximation to be valid, we need the fractional 
deviation of the molecule numbers to be small. It is straightforward to 
analyze this condition around steady-state. The fractional deviation is, 


((n?)) _  f {{0?)) 

(n)2 Qx2 ~ 
For clarity, we make the simplifying assumption that g’[a*] < 1 after 
scaling time relative to the degradation rate. Then, using Eq. 5.22, 


((n?)) _  /(o+1) 1 
(nj? 2° Qik” 
Non-dimensionalizing the mean x* using the characteristic concentration 


Ka, the fractional deviation will remain small provided the unitless pa- 
rameter, 


(641) 1 


A= 
2. OIC 


remains small. What is the physical interpretation of A? The first factor, 
(b+ 1)/2, is the average change in numbers when a reaction occurs (1 
for degradation, b for synthesis). The second factor is perhaps the more 
important factor — it is the dissociation constant (in units of concentra- 
tion) multiplied by the system volume. The factor 1/(Q44) expresses a 
characteristic number scale in the system. If K,4 was, for example, 25 
nanomolar and you were working in E. coli (Q ~ 10~!°L), then K4Q is 
about 25 molecules. So long as the average jump size (b+1)/2 is much less 
than 25, the linear noise approximation provides a useful approximation. 

In the autoactivator example, the state space is one-dimensional and 
so the number scale is obvious. For a more complicated model, it may 
not be so obvious what dimensionless parameters must be small for the 
approximation to remain valid. One strategy is to identify in the diffusion 
matrix D (Eq. 5.13) dimensionless parameters like A that will control the 
magnitude of the variance. 
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5.1.2 Approximation of the Brusselator model 


To illustrate the linear noise approximation, we return to the Brusselator 
example introduced on page 88. We use the ansatz nj = Qa; + Va; to 
re-write the molecule numbers n; as concentrations x; and fluctuations 
a,;. Expanding the step-operator and propensity functions in the master 
equation for the Brusselator, Eq. 4.25, then collecting terms in powers of 
Q-2, at 2° we have, 


dx, OM dry OM _ 


aul aul 
= ; 1+b 
de Ban. dE Bee [-aajee + (1+) 2 


1 on rite 


Identifying x, and xg with the macroscopic trajectory of the system, this 
term vanishes identically, because from Eq. 4.24, 


| [axix2 — bx] 


dx, 


Ai =1+ax?x2 —- (14 d)a1, 


dx 2 
dt 


— ax;x2 + bay. 


The next term in the expansion, at Q-2, is 


oll fa) 
7; on oo {[(2arya2 — (b+1)) a1 + ax} | II} 
0 
“pa {[(b — 2ax122) a4 — ax} as| IT} 4 
1 real etl 
5 {(( + 1) a) + xr 4 1] Da? [bay | aa? a9 sa} 
07 II 
+ [—ba - ax} x2| Aenea 


This is a Fokker-Planck equation with coefficients linear in a;. Writing 
the above in more compact notation, 


aul a 1 el 
Bee ee WP alt int ga oe 2 
at ye 49, (5) + 5D Dagan, i523) 


with, 


re 2aryt2—(b+1) aa? 

b— 2ax 1X2 —axr? 
D- (b+1)21 +arjxq —bx, —axr?xr2 
—bax — ax? x2 ba, + ax7x2 
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In this form, the mean and variance of a; and az are easily derived, since: 
d 
i (ot) = Sra) 
J 


d 
hi (aia;) = x Tix (QnQ;) + S- a, (Qj;QK) + Di; 
k k 


to give: 
d 2 
ai (a1) = [2ax,a2 — (b+ 1)] (a1) + aay (a2) , 
© (a2) = [b~ 2x12] (ox) — a2? (02), 
and, 
d 


= (az) = 2[2ar122 — (b+ 1)] (az) + 2ax? (aya) +14 (b+ 1) a1 +. 02720, 


— (a1a2) = [b — 2ax1 29] (az) + [2ax122 — (b+ 1) — ax{] (a1a2) 
+az? (a3) — br — ax? x, 


d 
a (a3) = 2[b — 2ax1 29] (aya) — Zax} (a3) + bry + ax{xo. 
Stable regime, b < 1+ a: In the parameter regime where the macro- 
scopic solutions are asymptotically stable, and tend to (x7, 73°) > (1, 2), 
the mean of the fluctuations will vanish asymptotically ((a1)*" , (a2)"") > 
(0,0), though the variance remains bounded and non-zero, 


—2b ss 
= 7 (ayaz)** = Gta) —5 (a3) = ; 
These expressions obviously become meaningless for a choice of parame- 
ters outside the stable region, 7.e., b > 1+ a. If the macroscopic solution 
is unstable, the variance diverges and after a transient period, the vari- 
ance will exceed 9, so that the ordering implied by our original ansatz 
ny = Qa; + 020; breaks down. This is certainly true for exponentially 
divergent macroscopic trajectories, but for orbitally stable trajectories, it 
is only the uncertainty in the phase that is divergent. To examine the 
fluctuations in a system on a periodic orbit, more sophisticated techniques 
are required (see Section 11.2 on page 244). 

It is instructive to compare the linear noise approximation with the 
results of stochastic simulation in order to contrast the two approaches. 
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Figure 5.2: The stable system - far from the Hopf bifurcation. 
Below the critical line b = 1+ a, the macroscopic system is stable, with 
microscopic trajectories fluctuating about the fixed point, shown here at 
(n1,n2) = (1000, 2000). a) (a,b) = (0.1,0.2), (a = 10 in Figure 23 of 
Gillespie (1977)). The system is very stable - the inset shows the first 
and second standard deviations of the Gaussian variations predicted by 
the linear noise approximation. b) (a,b) = (0.5,1), (a = 2 in Figure 22 of 
Gillespie (1977)). The system is still stable, but fluctuations are more ap- 
preciable. In both figures, Q = 1000, and the linear noise approximation 
describes the statistics of the fluctuations very well. 
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Figure 5.3: The stable system - near to the Hopf bifurcation. 
As the parameters approach the critical line b = 1+ a, the fluctuations 
are enhanced substantially. Here a = 0.95,b = 1.90, very close to the 
parameters used in Figure 4.5. 
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Since retention of terms up to first order in Q-2 results in a linear Fokker- 
Planck equation, the probability density II (a1, a2,t) must necessarily be 
Gaussian (to this order in Q). In fact, at steady-state, the probability 
density of a; can be explicitly written as the bivariate Gaussian distribu- 
tion, 


T1** (e049) = == exp [-} (1,02) By? (1, 02)" 

Q1,Q2) = x Q1, 02): >" + (ai,a ; 

1,2 On /dete. Pp g (1, A2 1,2 

with 

ee. |) (ty. “enee) | =. 1 b+ (1+ a) —2b 

~  L fanaa)? (ab) | (1 +a)—8 —2b 2 [b+(1+a)] |’ 
(5.24) 

centred upon (x{*,273°) = (1, By. The fluctuations are therefore con- 


strained to lie within some level curve of II** (a1, a2), described by the 
ellipse 


(6+ (1+4)) (1+) — b)jat + ; [(6+ (1+4)) (1+) — d)] a3 
+4a [(1 + a) — bl aia 


3 
2 


b (b? + 2b — 2ab + 2a + a? + 1) 
\« [( + a) - 8) 


The constant K determines the fraction of probability contained under 
the surface II** (a1, a2) (Figure 5.2). Notice that the major and minor 
axes of the level curve ellipse are the eigenvectors of Z~'. As the system 
parameters approach the bifurcation line b — 1+ a, the fluctuations 
become enhanced and the eigenvectors approach a limiting slope (Figure 
5.3 and Excercise 4). 

In the stable regime, the linear noise approximation provides an excel- 
lent approximation of the statistics of the fluctuations. For fluctuations 
around a limit-cycle, the linear noise approximation must be modified 
(see Section 11.2 on page 244). For the tracking of a trajectory through a 
multi-stable phase space, the linear noise approximation cannot be used 
because it relies on linearization about a unique stable macroscopic trajec- 
tory. In systems exhibiting a limit cycle or multistability, stochastic simu- 
lation can be performed easily. Nevertheless, as the number of molecules 
increases (Q. + oo), stochastic simulation becomes prohibitively time- 
consuming, though the analytic approximation becomes more reliable. 


=kK 
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Figure 5.4: Central Dogma of Molecular Biology A) RNA- 
polymerase transcribes DNA into messenger RNA (mRNA). Ribosomes 
translate mRNA into protein. B) The process of protein synthesis can 
be represented by a simplified model. Here, the block arrow denotes the 
DNA of the gene of interest. The gray ribbon denotes mRNA (m), and 
the green circle denotes protein (p). 


5.2 Example — ‘Burstiness’ in Protein Syn- 
thesis 


e M. Thattai and A. van Oudenaarden (2001) “Intrinsic noise in gene regulatory 
networks,” Proceedings of the National Academy of Sciences USA 98: 8614- 
8619. 


In cells, protein is synthesized via a two-step process - DNA is tran- 
scribed to messenger RNA (mRNA) and that mRNA is translated to pro- 
tein. Deterministically, this process can be represented as a pair of linear, 
coupled differential equations for the concentrations of mRNA (m) and 
protein (p), 


dm 

“dt. Am Bmm, 

d 

a = apm — Byp. (5.25) 


where a is the rate of transcription, a, is the rate of translation, and 
Bm and 8, are the rates of mRNA and protein degradation, respectively 
(Figure 5.4). Notice the steady-state values of the mRNA and protein 
levels are, 


s 
Qm Ay + ™M Qn * 
m=", and p= P 


Bm Bp 7 Bin? Bo 


On a mesoscopic level, the deterministic model above is recast as a master 


(5.26) 
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equation for the number of mRNA (n1) and protein (n2) molecules, 


OP t 
OP (asta!) = Am (E7* — 1) P(m, 2, t) + Bm (Et — 1) ni P(ni, no, t)+ 


+Qp (By! = 1) n,P(n1,n2,t) + Bp (E3 1) ngP(n1,n2,t), (5.27) 


where as above E is the step-operator: EF f(ni,nj) = f(ni+k,n;). All of 
the transition rates are linear in the variables n1, n2, so in principle the full 
distribution P(n1,n2,t) can be obtained. Using the moment generating 
function Q(21, 22, t) as in Section 4.1, the partial differential equation for 


Q is, 


aus “ (5.28) 
a 0 g 
Om (21-1) Q + Bm (1 a) 5S t par (22 ne Se 2) 5S 


Despite being a linear partial differential equation, even at steady-state 
it is difficult to determine Q*(z1, 22) exactly. Often Q is expanded as a 
Taylor series about the point z; = zg = 1, and using Eqs. 4.3 and 4.4, 
an evolution equation for the first two moments of the distribution are 
obtained by treating e; = (z1 — 1) and €2 = (zg — 1) as small parameters 
and collecting terms of like powers in ¢;. The calculation is algebraically 
tedious — you are encouraged to try. 

Alternatively, one can use the linear noise approximation. We make 
the usual ansatz that the concentration can be partitioned into determin- 
istic and stochastic components, 


m(t) = 21(t) +Q72a,(t) p(t) = x9(t) + N77 a(t). 


Focusing upon the steady-state, the mean mRNA and protein levels 
correspond to the deterministic values, 


The steady-state covariance is determined by the fluctuation-dissipation 
relation, Eq. 5.17. Solving for the steady-state covariance (writing ¢ = 


Bm/Bp), 


Sens ba el bs ie im) (5.29) 
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Figure 5.5: Burstiness of protein synthesis. A) Stochastic simula- 
tion the minimal model of protein synthesis (Eq. 5.27). Both the red and 
the blue curve have the same mean ((p) = 50), but distinctly different 
variance. The difference is captured by the burstiness; (a,a,) = (0.1, 2) 
[blue] and (0.01, 20) [red]. The degradation rates are the same for both: 
| = 5 mins and 65' = 50 mins. B) Burstiness can be observed exper- 
imentally. The Xie lab has developed sophisticated methods to observe 
bursts of protein off of individual mRNA transcribed from a highly re- 
pressed lac promoter. Trapping individual F. coli cells in a microfluidic 
chamber, it is possible to observe the step-wise increase of 3-gal off the 
lacZ gene. Cai L, Friedman N, Xie XS (2006) Stochastic protein expres- 
sion in individual cells at the single molecule level. Nature 440:358-362. 
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The steady-state values m* and p* are simply the steady-state values 
calculated from the deterministic rate equations (Eq. 5.26). Focusing 
upon the protein, the fractional deviation of the fluctuations ,/(p?)/(p) 
is, 


(p?) = 1 i | Xp 1 | 
(p)? ~~ p> |” Bp (1+4)] 
In prokaryotes, the degradation rate of the mRNA is typically much 


greater than the degradation of the protein, so ¢ > 1, and the fractional 
deviation becomes, 


2 
as [3 | = S[ 140], (5.30) 
Pp Bm] P 

where b = ay/Bm is called the burstiness of protein synthesis. The bursti- 
ness is a measure of the amplification of transcription noise by translation, 
since each errant transcript is amplified by b = ae =(protein molecules / 
mRNA) x (average mRNA lifetime) (Figure 5.5). What is surprising is 
that we can observe this burstiness experimentally (Figure 5.5b)! 

The moment generating functions only work with linear transition 
rates and can be algebraically formidable. The linear noise approxima- 
tion, by contrast, is calculated algorithmically and works regardless of 
the form of the transition rates, and in fact is exact for linear transition 
rates. Having said that, it is a wonder that moment generating functions 
are used at all. 


5.3 Limitations of the LNA 


The linear noise approximation is built upon the ansatz that the full state 
can be separated into deterministic and stochastic components, with the 
fluctuations scaling with the square-root of the system size (cf. Eq. 5.1), 


mM= Ox; + V20;. 


The limitations of the linear noise approximation are implicit in the 
ansatz. First, the approximation is a perturbation expansion that is valid 
so long as terms of order 1/VQ remain sub-dominant to the leading-order 
deterministic trajectory. Second, the linear noise approximation is a local 
expansion of the master equation, with the probability distribution of the 
fluctuations II(a,¢t) evaluated along the trajectory x(t). 
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The most conservative application of the approximation is to sys- 
tems with deterministic dynamics exhibiting a single asymptotically sta- 
ble fixed-point. For multistable systems, the approximation can still be 
applied, but only within a basin of attraction thereby providing an esti- 
mation of the fluctuations along a trajectory close to a fixed point. Fur- 
thermore, with slightly more elaborate analysis, the fluctuations about a 
stable limit cycle can also be estimated (see Section 11.2 on p. 244). The 
linear noise approximation cannot, however, be used to compute global 
properties of the system, such as switching rates between fixed-points or 
splitting probabilities among multiple equilibria. For those types of com- 
putation, stochastic simulation is the only consistent method presently 
available. 


Suggested References 
The review by van Kampen of his linear noise approximation is excellent, 


e N. G. van Kampen (1976) “Expansion of the Master equation,” 
Advances in Chemical Physics 34: 245. 


For a critique of the Kramers-Moyal expansion as a reliable approximation 
of the master equation, see the two articles by van Kampen, 


e N. G. van Kampen (1981) “The validity of non-linear Langevin- 
equations,” Journal of Statistical Physics 25: 431. 


e N. G. van Kampen, “The diffusion approximation for Markov pro- 
cesses,” in Thermodynamics and kinetics of biological processes, Ed. 
I. Lamprecht and A. I. Zotin (Walter de Gruyter & Co., 1982). 


Exercises 


1. Linear Fokker-Planck equation: Show that the solution of the 
linear Fokker-Planck equation is Gaussian for all time (conditioned 
by a delta-peak at t = 0). 


2. Stoichiometry vs. propensity: The deterministic rate equations 
are written in terms of the dot-product of the stoichiometry matrix 
S and the propensity vector v, Eq. 5.11. From the perspective of the 
phenomenological model, it is permissible to re-scale the entries in 
the stoichiometry to +1, and absorb the step-size into each entry in 
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the propensity vector. From the point of view of characterizing the 
fluctuations, it is very important to distinguish between stoichiom- 
etry and propensity. To that end, examine the following process 
with variable step-size: 


V1 
n—>n+a, Y=, 


v2 
n—>n-b m= 


SI@Bsal]o 


(a) Show that the deterministic rate equation is independent of 
the steps a and b. 


(b) Calculate the variance in n. Show that it is strongly dependent 
upon a and b. 


. Exponential autocorrelation in the linear noise approxima- 


tion: Derive the expression for the steady-state autocorrelation 
function in the linear noise approximation, Eq. 5.18. There are sev- 
eral ways to do this; perhaps the most straightforward is to general- 
ize the ideas developed in the section on the fluctuation-dissipation 
relation, Section 2.3.2 on p. 42. 


. Brusselator near the bifurcation: The major and minor axes of 


the stationary probability distribution equiprobability contours are 
given by the eigenvectors of the inverse-covariance matrix, 2~!. In 
the Brusselator example, calculate these eigenvectors using Eq. 5.24. 
What is the slope of the major axis as the parameters approach the 
bifurcation a + b— 1? Do you recognize this number? 


. Higher-order corrections: For a model with nonlinear degrada- 


tion, 


Vv 
n—->n+l1, y=a, 


_ Bn(n-1) 


v2 
2n—>n, V2 Q2 ; 


determine the following, including terms of order 1/0 in the linear 
noise approximation: 


(a) The partial differential equation for I(a, t). 
(b) The mean and variance of n. 


(c) The auto-correlation function at steady-state ((n(t)n(t — 7))). 
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6. Negative feedback: Consider the nonlinear negative feedback 
model, 
ryrtb, 1 =0-g(r/(Q-Kp)), 


. 
r—>r—1, =P: 


where g(x) is a Hill-type function, 
n\-1 
g(a) =(1+a")™. 
In the regime where the system is homeostatic (r* >> Q- Kp): 


(a) Find explicit estimates for the steady-state mean (r*), and the 
steady-state variance ((r?)). 


(b) How does the fractional deviation about the steady-state, \/((r?)) /(r*), 
change as the cooperativity nm increases? 


(c) Identify a unitless parameter that determines the validity of 
the linear noise approximation (see Section 5.1.1). 


CHAPTER 6 
BT 
| 


FOKKER-PLANCK EQUATION 


The work of Ornstein and Uhlenbeck on Brownian motion suggested that 
there is a general connection between a certain class of stochastic differ- 
ential equations for a given trajectory (as studied by Langevin) and a 
certain family of partial differential equations governing the probability 
distribution for an ensemble of trajectories (as studied by Einstein). In 
this Chapter, we formalize the connection between the two approaches, 
and find that for stochastic differential equations with multiplicative white 
noise, a new calculus is required. 
The master equation (cf. Eq. 3.11 on page 59), 


Co 


2 p(asles,7) = / [w (x3|v2) p (x2|41,7) — w (wa|z3) p (w3|21,7)] dra, 


—oo 


is integro-differential equation (or, in the case of a discrete state-space, a 
discrete-differential equation, Eq. 3.12). This family of equations is diffi- 
cult to solve in general. The Kolmogorov, or Fokker-Planck, equation is 
an attempt to approximate the master equation by a (one hopes) more 
amenable partial differential equation governing p(23|21, 7). Kolmogorov 
derived his equation by imposing abstract conditions on the moments 
of the transition probabilities w in the Chapman-Kolmogorov equation, 
while the derivation of the same equation by Fokker and Planck pro- 
ceeds as a continuous approximation of the discrete master equation. We 
shall see in Chapter 5 that in systems where the noise is intrinsic to the 


120 


Fokker-Planck Equation 121 


dynamics, both derivation are inconsistent approximations, and that a 
systematic approximation method is needed. (See N. G. van Kampen 
(1982) ‘The diffusion approximation for Markov processes’ in Thermody- 
namics and kinetics of biological processes, Ed. I. Lamprecht and A. I. 
Zotin, p. 181.) 


6.1 Kolmogorov Equation 


e A.N. Kolmogorov (1931) Math. Ann. 104: 415. 


Consider a stationary, Markov process — one dimensional, for simplic- 
ity. Write the Chapman-Kolmogorov equation as, 


CO 


p(aly,t + At) = [ plclet) platy, Ab de, (6.1) 


—Cco 
and define the jump moments by, 


Co 


an(2, At) = i (y —2)"p (ely, Ad) dy. 


—oco 


Furthermore, assume that for At > 0, only the 1%’ and 2”¢ moments 
become proportional to At (assume all higher moments vanish in that 
limit). Then, the following limits exist, 

A(z) = 


1(z, At) B(z)= a(z,At). (6.2) 


li : li y 
At>0 At © Ai>0 At © 
Now, let R(y) be a suitable test function, possessing all properties required 
for the following operations to be well-defined, and note that, 


pro ae = fim, AG : R(y) [p (aly, t + At) — p (aly, t)] dy 


—oo 


Co 


en ke PF 
= jim AG | Ropes apay— f RO)plelvt)ay 


—oo 
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Using Eq. 6.1 to replace p(a|y,t + At), 


Po 
[rw fa 
1 Co [oe) [oe) 
dima) [ RW) f pletely dtdedy— [ RWW) Pely.t) dy 


We expand R(y) around the point y = z and interchange the order of 
integration. Since we have assumed all jump moments beyond the second 
vanish, we are left with, 


oe) dp - 
[rw fa 

1 Co [oe) 

dima f RK @val) (| w-2r lly Atay) de 


[Revco | 5 oct. ayay | az 


— CoO i CO 


lim — 
cer At 


(Prove to yourself that this is true.) In terms of the functions A(z) and 
B(z) defined above in Eq. 6.2, 


co 


[ro Pay - [Re (x|z,t) A (2)de+ f RY (2)p(alet) Blo de. 


(6.3) 


Recall that R(y) is an arbitrary test function, with all the necessary prop- 
erties for the following to hold. In particular, choose R(y) so that R(y) 
and R’(y) > 0 as y > +oo as fast as necessary for the following to be 
true, 


lim R(z)p(alz,t) A(z) = 9, 


Ze 


lim R' (z) p (az, t) B(z) =0. 


Ze 


Integrating Eq. 6.3 by parts, we obtain: 


i ee (elz, PUD 2 64 (2p (elz.t}- 42 (BEPlalz)} dz =0. 
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By the du Bois-Reymond lemma from analysis, since R(y) is an arbitrary 


test function, we must conclude that, 


Op hn) = >. {A (y) p (aly, t)} + saa (B (y) p(aly,t)}. (6.4) 


This is the Kolmogorov Equation, often called the Fokker-Planck Equa- 
tion, though sometimes appearing as the Smoluchowski Equation, or the 
Generalized Diffusion Equation. The first term on the right, involving 
A(y), is called the convection or drift term; the second term, involving 
B(y), is called the diffusion or fluctuation term. 

Remarks: 


1. Eq. 6.4 may be written as a continuity equation, 


OP, SUF 
Ot — Oy’ 
where J is the probability fluz obeying the constitutive equation, 
10 
J(y) = Ay)p — 5 5 {Bly)P}- 
2 Oy 


2. The derivation above may be repeated (with minor modifications) 
for an n-dimensional process; the result is then given by, 


et s 2 (Ay) +3 0 ie {Buly)p}. (6.5) 


kl=1 


3. The derivation of the Kolmogorov equation was rightly considered 
a breakthrough in the theory of stochastic processes. It is obvious 
that having a partial differential equation that is equivalent (un- 
der the assumed conditions) to the impossibly difficult Chapman- 
Kolmogorov equation is great progress. This success has been ex- 
ploited in a large number of applications since 1931; but as work 
in this area grew, subtle questions of consistency arose, and the de- 
bate is far from over today. The crux of the problem is the following: 
Despite the impeccable mathematics of Kolmogorov, the derivation 
does not hold any information regarding what physical processes ac- 
tually obey the necessary mathematical assumptions — in particular, 
for which processes one might expect the higher jump moments to 
vanish. See N. G. van Kampen (1982) ‘The diffusion approximation 
for Markov processes’ in Thermodynamics and kinetics of biological 
processes, Ed. I. Lamprecht and A. I. Zotin, p. 181. 
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4. Note that the Kolmogorov equation is an evolution equation for the 
conditional probability density, and that it must be solved subject 
to the initial condition, 


p(xly,0) = d(x — y). 


5. The initial-value problem can be solved in a few cases (see H. Risken, 
The Fokker-Planck Equation: Methods of Solution and Applications, 
(Springer-Verlag, 1996), for details). Notice, however, that before 
any attempt at solving the initial value problem is made, one must 
find an explicit expression for the coefficients A(y) and B(y). 


6.2. Derivation of the Fokker-Planck Equa- 
tion 


Kolmogorov’s derivation is formal, with no reference to any particular 
physical process, nor even an indication of what kinds of physical pro- 
cesses obey the necessary mathematical conditions. Fokker and Planck 
derived the same equation based on assumptions tied directly to partic- 
ular features of physical systems under consideration. Nevertheless, it 
will become clear in Chapter 5 that Fokker’s derivation (described below) 
is only consistent if the transition probabilities are linear in the state 
variables. Despite this very strict limitation, the Fokker-Planck equation 
is the favoured approximation method for many investigators (although 
rarely applied to systems with linear transition rates). 

The labeled states 71, 22,73 appearing in the master equation may 
be interpreted as representing the initial, intermediate and final state, re- 
spectively. Let them be re-labeled as yo, y’, y, so that the master equation 
reads, 


Co 


5b (ulues7) = f lw (uly) (Wflvo.7) — w fly) pluleo. dy’. (6.0) 


—oco 


This is an approximate evolution equation for the conditional probabil- 
ity density, and it is linear in p(ylyo,T) since w(yly’) is supposed to be 
provided by the “physics” of the system. 

If the dependence on the initial state is understood, then Eq. 6.6 may 
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be written in a short-hand form as, 


Co 


splut)= [ wily)po.7)-wuy)purlay. 6.7) 


—co 


This must not be confused with an evolution equation for the single- 
state probability distribution, since p is still conditioned upon the initial 
distribution p(yo,to). We introduce the jump-size Ay = y — y’; in that 
way, we can re-write the transition probabilities in terms of the jump-size 
Ay. The transition probability w(y|y’) is the probability per unit time of 
a transition y’ > y, t.e., that starting at y’, there is a jump of size Ay. 
We write this as, w(yly’) = w(y’, Ay) = w(y — Ay, Ay). Similarly, we 
re-write the transition probability w(y’|y) as w(y’|y) = w(y, —Ay). With 
this change in notation, the master equation becomes, 


Co 


slut) =f lw(y— Ay. Av) ry Ay,T) — w (y, Ay) p(y, 7)] dAy 
= [ w-Av.Av)pu~du.7)ddy— pyr) f w(y.—Ay) dy 


(6.8) 
Next, we make two assumptions, 


1. Only small jumps occur. That is, we assume that there exists a 
6 > 0 such that w(y’, Ay) = 0 for |Ay| > 6 and w(y’ + Ay, Ay) = 
w(y’, Ay) for |Ay| < 6. 


2. p(y,T) varies slowly with y. That is, we assume that p(y+Ay,7T) © 
p(y,T) for |Ay| < 6. 


Under these conditions, one is able to Taylor expand w and p in y’ about 
Y, 


w (y', Ay) p(y’, 7) © 


uplyan* [peru] w= + | ww] 


y/=y 
Or, since Ay = y—y’, to O(Ay?), 
w(y’, Ay) p(y',7) = 


w (y, Ay) p(y, T) — Aus [w (y, Ay) p(y, 7)] + 2 a [w (y, Ay) p (y,7)] - 
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Eq. 6.8 is then approximated by, 


2 o(ysr) = [ ewawvterrraiy— [ Ay5 (wy. Avr usr) day 


(6.9) 


ay nee 7 
+5 [ Av? jio lw. Au) plu.) doy | elu sdnrtrry dy. 


Note the dependence of w(y, Ay) on the second argument is fully main- 
tained; an expansion with respect to Ay is not possible since w varies 
rapidly with Ay. Furthermore, since 


[ Aug, bw An) our) day = 5 P(y,T) Ayw (y, Ay) dAy} = 


iy pGraiw. 


we finally obtain the Fokker-Planck Equation, 


Fpur) 5 arly) pri + 55 lav) PCWT). 6.10) 


Remarks: 


1. The Fokker-Planck equation has the same mathematical form as the 
Kolmogorov equation, viz. a linear, 2”¢-order partial differential 
equation of parabolic type. Nevertheless, the assumptions made in 
deriving the Fokker-Planck equation make it possible to answer the 
question: For a given w, how good an approximation is provided by 
this equation? 


2. It is not difficult to include all terms in the Taylor expansion used 
above and obtain the Kramers-Moyal Expansion, 


2 oy.) = Pe fan yp tur), (6-1) 


nl Oy” 


where, 


an (y) = / Ay"w (y, Ay) dAy. 


—oCo 
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f (4585 lat) = ff (06 lt) f aot lade (t, <t, <ty) 


Chapman-Kolmogorov Equation - A (nonlinear) functional equation 
governing all conditional probability densities of a Markov process. 


Stationary 
Process 


P(x |x, t+7')= J P( | 4,7) p(% |x,,7) dx, 


Chapman-Kolmogorov Equation Assume 
a,=0, n>2 
Compute from 
microscopic theory: . 
op a] 10° 
p(x] 2.2’) =(1-a,2’) 5(x-z) ae Hy l40) P+ sa lBO) P| 
+2'w(x|z)+0(7’) Kolmogorov Equation 


3 7 
57 Pl |x,,7) = J [w(x 15) p(x, |x,,7)-w(x, |) P(x; |x,.7) Jax, 
Master Equation 


Assume the jumps are “small”, and 
p(y,t) is a slowly varying function of y. 


op__ 9a 1 0 : Fokker-Planck 
a oy [a(»)r]+5 ay? [a.(y)P] Equation 


a,(x)= [ etw(s.2)de 


Figure 6.1: Review of Markov Processes 


While this is formally equivalent to the master equation, to be 
practical the expansion must be truncated at a certain point — the 
Fokker-Planck equation, for example, is the result of truncation af- 
ter two terms. Breaking-off the series in a consistent fashion requires 
the introduction of a well-defined extensive variable quantifying the 
relative size of the individual jump events (as we shall see in sec- 
tion 5.1). The truncated Kramers-Moyal expansion is what Einstein 
used in his derivation of the diffusion equation for a Brownian par- 
ticle (cf. Eq. 1.7 on page 6). 
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7() 


Figure 6.2: Evolution of p(y,t) toward its equilibrium distribution 
Peq(¥)- 


6.3. Macroscopic Equation 


e R. Kubo, K. Matsuo and K. Kitahara (1973) Fluctuation and relaxation of 
macrovariables. Journal of Statistical Physics 9:51—96 


We know that the study of physical phenomena always starts from a 
phenomenological approach, reflecting the initial stage of organization of 
experimental facts. This stage often leads to a mathematical model in 
terms of differential equations where the variables of interest are macro- 
scopic variables, in which the microscopic fluctuations are neglected (av- 
eraged over), resulting in a deterministic theory. Examples include Ohm’s 
law, chemical rate equations, population growth dynamics, etc. 

At the next, more fundamental (mesoscopic) level, the fluctuations 
are taken into account — by the master equation (or the Fokker-Planck 
equation, if it applies) for stationary Markov processes. As the latter de- 
termines the entire probability distribution, it must be possible to derive 
from it the macroscopic equations as an approximation for the case that 
the fluctuations are negligible. 

Let Y be a physical quantity with Markov character, taking the value 
yo at t = 0; t.e., P(y,tlyo,0) > d(y — yo) as t > 0. For definiteness, 
say the system is closed and isolated. Then we have from equilibrium 
statistical mechanics that the probability distribution p(y, t) tends toward 
some equilibrium distribution peg(y) as t > 00, 


Jim p(y, t) = Pea (y); 
(see figure 6.2). We know from experience that the fluctuations remain 


small during the whole process, and so p(y,t) for each ¢ is a sharply- 
peaked function. The location of this peak is a fairly well-defined number, 
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having an uncertainty of the order of the width of the peak, and it is to 
be identified with the macroscopic value 7(t). Usually, one takes, 


g(t) =(Y), = J up (st) dy. (6.12) 


As t increases, the peak slides bodily along the y-axis from its initial 
location ¥(0) = yo to its final location (co) = Yeq; the width of the 
distribution growing to the equilibrium value. 

Bearing this picture in mind, we have: 


df 8 
ae = [usp 


Using either the master equation or the Fokker-Planck equation, we have, 


Fatt) = f aluyp(ut)dy = (alu) (6.18) 


which we identify as the macroscopic equation. 


Example: One-step process (see p. 62). The discrete master equation 
reads, 


d 
ae” = 1Tn4+1Pnt+i1 + Gn-1Pn-1 — (Tn + Gn)Pn- (6.14) 


Multiplying both sides by n, and summing over all n € (—o0,0o), we 
obtain the evolution equation for the average n, 


d\n) 
—— = (gn) — (Tn). wl 
TL = (Gn) — (rn) (6.15) 
For example, with r, = Bn and g, = a, Eq. 6.15 reduces to: 
Me) — Bln). 


Note in this last example, we had (ry) = (r(n))_ = r((n)z) and simi- 
larly for g,. The situation illustrated by this example holds in general: 


Proposition: If the function a;(y) is linear in y, then the 
macroscopic variable 7(t) satisfies the macroscopic equation, 


a(t) = a9), (6.16) 


which follows exactly from the master equation. 
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If, however, ai(y) is nonlinear in y, then Eq. 6.164 6.13, and the macro- 
scopic equation is no longer determined uniquely by the initial value of ¥. 
Not only, but one must also specify what “macroscopic equation” means. 
In the literature, one finds Eq. 6.16 used even when a,(y) is nonlinear; the 
(usually-not-stated-) assumption is that since the fluctuations are small, 
and ai(y) is smooth, we may expand it about (y); = 9, 


= = ee aye _ 
ar(y) = a1(9) + 049) -(Y— 9) + 54H) - (YH) + --- 
Taking the average, we have, 


(ai(y))e= ala) + Sala) (yD) + 


(a1(y))t © a1(9), 


on the grounds that the fluctuations ((y — y)?); are small. In that case, 
we read Eq. 6.16 as an approximation, 


Salt) ~a1(9); (6.17) 


if a,(y) is nonlinear — this is the “meaning” that is assigned to the macro- 
scopic equation. 

It is also possible to deduce from the master equation (or the Fokker- 
Planck equation) an approximate evolution for the width of the distribu- 
tion; first, one shows that 


HN (a(y))e + 2yar(u))e 
from which it follows that the variance o7(t) = (y”)4 — (y)? obeys, 
do? wt 
Be 7m a2yie + 2K(y — Gary). (6.18) 
Again, if a; and ag are linear in y, then Eq. 6.18 is identical with, 
We" = an(G) + 20°04 (9). (6.19) 


though in general, Eq. 6.19 will be only an approximation. This equation 
for the variance may now be used to compute corrections to the macro- 
scopic equation, Eq. 6.17, 
Oe Hi arta spnp taht 6.20 
= 9 (t) = ar(y) + 50° a1 (9), (6.20) 
dt 2 
do? 


rea a2(¥) + 207) (9). (6.21) 
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Note that, by definition, a2(y) > 0, and for the system to evolve toward 
a stable steady state, one needs a/(y) < 0. It follows that o? tends to 
increase at the rate az, but this tendency is kept in check by the second 
term; hence, 


2 a2 


—> —— 
© alae 


(6.22) 
and so the condition for the approximate validity of Eq. 6.17 — namely 
that the 2”¢ term in Eq. 6.20 be small compared to the first — is given by, 


(6) 1 
2\a‘| 2 


ai| < ai], (6.23) 


which says that it is the second derivative of a; (responsible for the depar- 
ture from linearity) which must be small. The linear noise approximation, 
which we shall consider in Chapter 4 (Section 5.1) provides a more satis- 
fying derivation of the macroscopic evolution from the master equation, 
proceeding as it does from a systematic expansion in some well-defined 
small parameter. 


6.3.1 Coefficients of the Fokker-Planck Equation 
The coefficients of the Fokker-Planck equation, 


Fruit) 5 la wPUr+ 555 laav)Punl, — (6.24) 


are given by, 


ai (y) = i Ayw (y, Ay) dAy 


a2 (y) = / Ay’w (y, Ay) dAy. 


—Co 


In theory, they can be computed explicitly since w(y, Ay) is known from 
an underlying microscopic theory. In practice, that is not always easy to 
do, and an alternative method would be highly convenient. The way one 
usually proceeds is the following: 
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Let p(y, tlyo, to) be the solution of Eq. 6.24 with initial condition d(y— 
yo). According to Eq. 3.4, é.e., 


CO 
Pp (y2, te) = J Petal) pists) dy, 


—oo 


one may construct a Markov process with transition probability p(y2, t2|y1, t1) 
whose one-time distribution p(y1,t1) may still be chosen arbitrarily at one 
initial time to. If one chooses the “steady-state” solution of Eq. 6.24, 


y 
constant a(y’),, 
Ds(y) = ———— exp 2 | dy’ | , (6.25) 

re) aa(u!) 


then the resulting Markov process is stationary. This, of course, is only 
possible if p, is integrable. For closed systems, Eq. 6.25 may be identified 
with the equilibrium probability distribution p., known from statistical 
mechanics. 

To “derive” the coefficients of the Fokker-Planck equation, we proceed 
as follows. From Eq. 6.24, the macroscopic equation is, 


“ude = (arly))e an (y)e) 
(neglecting fluctuations). Since this equation must coincide with the 
equation known from the phenomenological theory, the function a, is de- 
termined. Next, we know peg from statistical mechanics, identified with 
Eq. 6.25; hence ag is also determined. Note the various (for the most part 
uncontrolled) assumptions made in this derivation. 

The simplest of all procedures, however, is the derivation based upon 
the Langevin equation. Recall that this begins with the dynamical de- 
scription, 


dy so 
+ y= F(t) (6.26) 
with assumptions about the statistics of the fluctuating force f(t), 
(f(t)) = 0, (f(t) f(t2)) = 2Dd(ti — te). (6.27) 


Integrating Eq. 6.26 over a short time interval, 
t+At 
Ag=—-egdt+ f (6) ak, 
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so that, 
t+At 


Ben, ie, Sige ons Ae 
nae ne) uh bee 


Similarly, 


t+At 


(an?) =e (ars ff (£6) Fen)agen, 


t 
from which it follows — using Eq. 6.27— that 

a2(y) = 2D, 
and the corresponding Fokker-Planck equation reads, 


Op oO 
a Bay (yp) 4 Daa (6.28) 


This coincides with the appropriate Fokker-Planck equation for the posi- 
tion of a Brownian particle in the absence of an external field, as derived 
by Ornstein and Uhlenbeck. 

The identification of a;(y) as the macroscopic rate law is really only 
valid when a;(y) is a linear function of y — as it is in the Ornstein- 
Uhlenbeck process described above. For nonlinear systems, derivation 
of the Fokker-Planck equation in the manner described here can lead to 
serious difficulties, and so the systematic expansion scheme described in 
Section 5.1 is indispensable. See N. G. van Kampen (1965) Fluctuations 
in Nonlinear Systems. 


6.4 Pure Diffusion and Ornstein-Uhlenbeck 
Processes 
As an example of the Fokker-Planck equation, we shall consider two fam- 


ilies of diffusive equations - pure diffusion, and diffusion with drift (which 
is the same as the Ornstein-Uhlenbeck process shown above, Eq. 6.28). 
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6.4.1 Wiener process 


First consider the simple case with a; = 0,a2 = k, where k is some 
(positive) constant. Then we have the problem, 


Op _ 
Oot 


O07 


Raye 


with p(ylyo,0) = d(y — yo). (6.29) 


This is a parabolic partial differential equation with constant coefficients, 
so it is readily solved using a Fourier transform in y (see section B.4.2 on 
page 292), 


Co 


o(s,t) = i e'Yp(y, t) dy; 


—Co 


(8,0) = €*8%. 
The transformed Eq. 6.29 leads to the ordinary differential equation in ¢, 


do(s,t) 
pt 


= —s°kd(s, t), 
whose solution is, 
$(s,t) = explisyo — ks7t]. 
This is a very famous Fourier transform that appears in many models 


of heat conduction. The inverse Fourier transform back to p(y,t) yields 
what is sometimes called the ‘heat kernel’, 


(6.30) 


1 2 
vu.t) = ae exw | ie i. 


In particular, for k = D, this coincides with Einstein’s result for a free 
particle in Brownian motion (i.e. in the absence of a damping force), 


dW 
sometimes also called a Wiener process (denoted W(t)), or a purely dif- 
fusive process, or sometimes even called Brownian motion, although that 
designation is not very descriptive. 
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Because the forcing function n(t) is delta-correlated, (n(t1)n(t2)) = 
d(ty — tz), the autocorrelation function of the Wiener process is straight- 
forward to compute, viz., 


(W (ti) W (te) = 2p | [su a Saati, 
0 0 


6.4.2 Ornstein-Uhlenbeck process 


Now consider the more general Fokker-Planck equation, aj = —ky,a2 = 
D, Eq. 6.28, 


(6.33) 


where k and D are (positive) constants. Proceeding as above, taking the 
Fourier transform, we are left with a first-order linear partial differential 
equation, 

do do 


== D 2 
pt Sag =~ 58" (6.34) 


This is easily solved by the method of characteristics to give the partial 
solution, 


$(s, t) =e P*/4k (se), (6.35) 


in terms of an arbitrary function g. From the initial distribution p(y|yo, 0) = 
d(y — yo), the arbitrary function g is fully determined: 


2 


g(x) = exp ca + ino : (6.36) 


The complete solution is therefore, 


2 
o(s,t) =exp| = (1 gah | isyoe™| : (6.37) 


Comparison with the solution above for the purely diffusive process shows 
that the density is a Gaussian with 


(YQ) = yoe™, 
D 
2 (4 _ ,-2kt 
ey (1—e77**) , (6.38) 
as derived by Ornstein and Uhlenbeck (see Chapter 1). 


on 
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C) p(y.) 


Width 
2,|(D/2k)-(1-e*") 


Figure 6.3: Some solutions of the Fokker-Planck equation, (6.39). 
A) Liouville equation, az = 0. Without diffusion, the initial delta- 
distribution follows the deterministic trajectory. B) Wiener pro- 


cess, a, = A,ag = D, both constant. The solution is a Gaus- 
sian, but the variance spreads « t. C) Ornstein-Uhlenbeck process, 
a, = —ky,ag = D. The solution is Gaussian, relaxing to an equilibrium 


distribution around y = 0. It should be understood that in this figure 
Ply, t) = P(y, tlyo, 0). 


6.4.3 Heuristic Picture of the Fokker-Planck Equa- 
tion 


Examining several limiting cases of the Fokker-Planck equation is useful 
in developing intuition regarding the behaviour of the solution. Here, 
again for convenience, is the general Fokker-Planck equation, 


Fru) =F law PUT+ 55 laav)Pur)). — (6.39) 


We begin with the same initial condition p(y,0) = d(y — yo), then by 
choosing different coefficients ai(y) and a2(y), certain characteristics of 
the solution can be made clear (Figure 6.3). 


1. Liouville equation (ai(y) = A(y), a2(y) = 0). The probabil- 
ity remains a delta-peak, localized about y(t), i.e., p(y, t|yo,0) = 
d(y — y(t)), where y(t) obeys the deterministic ordinary differential 
equation, 


ay); (0) =. 
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2. Wiener process (ai(y) = A, a2(y) = D). This is the equation 
governing pure diffusion. The mean follows the line y(t) = yo + A-t, 
and the variance grows linearly with t, (y?) — (y)? = 2Dt. 


3. Ornstein-Uhlenbeck process (a;(y) = —ky, a2(y) = D). The 
coefficient a;(y) = —ky is like a Hooke’s force restoring the mean 
state to y = 0 exponentially quickly, 


(y(t)) = yor. 


The variance grows until it reaches the equilibrium value (y?) = 
D/2k. The equilibrium distribution is the Gaussian centered at 
y = 0 with variance (y?) = D/2k, viz. 


7 1 ye 
US Fig DPR) | 2: a7 | | 


6.5 Connection to Langevin Equation 


From the work of Einstein and Langevin, there is clearly a direct anal- 
ogy between the Fokker-Planck and Langevin equations in the case of 
Brownian motion. That direct correspondence holds in general and can 
be uniquely assigned for Langevin equations where the coefficient of the 
noise term is constant. For non-constant coefficients, the correspondence 
is no longer unique and an additional interpretation rule for the Langevin 
equation must be provided. 

We shall outline the connection between the Fokker-Planck and Langevin 
equations below, for the cases of linear and non-linear drift coefficients 
with the requirement that the noise coefficient is constant. Interpretation 
rules for the case where the noise coefficient is a function of the state 
variables will be postponed to Chapter 7. 


6.5.1 Linear Damping 


We shall review the general procedure for the linear case. Langevin begins 
with the dynamical equation 


dy 


ap = Bu + nt), (6.40) 


where the random force 7(t) is supposed to have the properties 


(n(t)) = 0, (6.41) 
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(n(t)n(t’)) = Pot — t’). (6.42) 


Note that Eqs. 6.41 and 6.42 are not enough to fully specify the stochastic 
process 7(t) — they only specify the first two moments. Also note that 
because of the random forcing n(t), the variable y(t) becomes a stochastic 
process. Suppose that y(0) = yo, we have from Eq. 6.40, 


t 
y(t) = yoe P* + ee f eft n(t!) dtl. (6.43) 
0 
Averaging over the ensemble and using Eq. 6.41, 


(y(t)) 4 = yor P*. (6.44) 


Moreover, after squaring, averaging and using Eq. 6.42, one gets, 
T 
(¥*(t))yo = yor 2° + 3B (1-77). (6.45) 


The constant [is unknown so far; however, for t >> 1/6 the effect of 
the initial conditions must disappear and the system reaches equilibrium. 
Equilibrium statistical mechanics (the equipartition of energy) can be 
used to derive, 


sry?) = 5hoT, (6.16) 


where kg is Boltzmann’s constant and T is the equilibrium temperature. 
Taking t > oo in Eq. 6.45 and comparing with the equilibrium result 
above, we have 


2 
fe eer (6.47) 
m 


an example of the so-called Fluctuation-Dissipation Relation (see Eq. 2.31 
on page 44). Physically, the random force 7(t) creates a tendency for y 
to spread out over an ever broadening range of values, while the damping 
term tries to bring y back to the origin. The equilibrium distribution is 
the resulting compromise between these two opposing tendencies. 

To derive the relation between the Langevin equation and the Fokker- 
Planck equation, take for t in Eq. 6.44 a small time interval At. Then, 


(Ay) yo = (y(t) yo — yo = —ByoAt + O(At?). (6.48) 
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Therefore, the drift coefficient in the Fokker-Planck equation is, 


Sas “Is (AY) yo — 
a1(yo) = jim 5 = —Byo- (6.49) 
Similarly, from the square of Eq. 6.48, 
(Ay?) yo = TAt + O(At?), (6.50) 
to give d2(y) =I. The resulting Fokker-Planck equation reads, 
Op oO bo? 
a ee ~—~(Ip). 6.51 
Bt Dy (By)p] + 5 ay! P) (6.51) 


This Fokker-Planck equation returns the same values for the first- and 
second-moments of y(t) as does the Langevin equation (6.40), along with 
Kgs. 6.41 and 6.42; however, it cannot yet be said that they are equiv- 
alent because the higher moments do not agree. In fact, whereas the 
Fokker-Planck equation provides a definite expression for each moment, 
the Langevin equation does not unless higher moments of 7(t) are speci- 
fied. Customarily, the assumption is 


e All odd moments of 7(¢) vanish. 


e All even moments obey the same rule as for Gaussian distributions, 
e.g. decomposition into pair-wise products, 


(n(t1)n(t2)n(ts)n(ta)) = (n(t1)n(t2)) (n(ts)n(ta)) + 
+(n(ta)n(ts)) (n(t2)n(ta)) + (n(t1)n(ta)) (n(t2)n(ts)) = 
T?[5(ty — t2)5(t3 — ta) + (t1 — t3)6(t2 — ta) + 6(t, — ta)d(te — tg)]. 

(6.52) 


Alternatively, one may stipulate that all higher cumulants of 7 vanish 
beyond the second. ‘The equivalence between the Fokker-Planck and 
Langevin descriptions then holds: According to Eq. 6.43, the value of 
y(t) is a linear combination of the values that (t) takes at all previous 
times (0 > t’ > t). Since the joint distribution of all quantities 7(t’) 
is Gaussian, it follows that y(t) is Gaussian. By the same argument, 
the joint distribution of y(t1), y(t2),..., is Gaussian. Hence the process 
y(t) determined by Eq. 6.40 with initial condition yo is Gaussian. On 
the other hand, we know that the solution of the Fokker-Planck equation 
(6.51) with initial value yo is also a Gaussian. Since its coefficients have 
been chosen so that the first- and second-moments of both Gaussians are 
the same, it follows that the two Gaussians are identical. (Note that noise 
n(t) defined as above is called Gaussian white-noise.) 
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6.5.2 Nonlinear Damping 


Another case occurring frequently in applications is when the Langevin 
equation contains a nonlinear damping term, 


Y — aly) + (0), (6.53) 


where the random force is still assumed to be Gaussian white-noise, but 
the damping term A(y) is a nonlinear function of y. Although in this case 
an explicit solution for y(¢) is not usually available, it can still be argued 
that the Langevin equation is equivalent to the Fokker-Planck equation, 


2 
ne 10 


ae oy [A(y)p] + 2 yet?) (6.54) 


Outline of the proof: First, it is clear that for each sample function 
n(t), Eq. 6.53 along with the initial condition y(0) = yo uniquely deter- 
mines y(t) from the existence/uniqueness theorem for differential equa- 
tions. Since the values of 7(t) at different times are independent, it follows 
that y(t) is Markovian. Hence, it obeys a master equation which may be 
written in the Kramers-Moyal form (see Eq. 6.11 on page 126). 

Next, for a very small At, Eq. 6.53 gives 


t+At t+At 
y= / A(y(t!))dt! + | n(t!)dt': (6.55) 
t t 
hence, upon averaging, 
(Ay) yo = A(yo)At + O(At?), (6.56) 


to yield a,(y) = A(y) as before. Taking the square of Eq. 6.55 and 


averaging, 
t+At 2 
(Ay?) = (( i. A(y(t’)) i) ) 
t+At t+At 
42 | ( i (A(y(t))n(t”)) w’) dt 


t+At t+At 
+ | ( | (n(t’)n(t")) w') dt. 
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The fist line is O(At?), so it does not contribute to a2(y). The third line 
gives, 


t+At t+At t+tAt 
i, (/ (n(t')n(t)) w) dt! =) Tdt'=TAt, (6.57) 


as in the linear case. This agrees with the last term of Eq. 6.54, provided 
the second term above does not contribute. To untangle the second term, 
expand A(y(t’)) about y(t), 


A(y(#)) = A(y(t)) + Ay) (y(t) — yt) +. 


Substituting this in the second term above, 


t+At t+At 
2 | é (Alp) a") = (6.58) 
0 


t+At 
2A(y(t))At | n(t”’)) dt" + 


t+At t+At 
24 (lo) [ (/ ((y(t’) — w(t)]- nea) a. (6.59) 


the last term of which is O(At?), and therefore does not contribute to 

ag(y). Similarly, one can show that all higher-order moments contribute 

nothing, since ((Ay)”) = o(At). That concludes the outline of the proof. 
In summary, the Langevin equation, 


dy 
dt 


(with (n(t)n(t’)) = Té(t — t’) and the higher-moments of 7(t¢) suitably 
defined) is equivalent to the Fokker-Planck equation, 


= Aly) + en(t), 


dp a CT Op 
Foy welt Sa 


irrespective of the form of A(y). Difficulties emerge once c(y) is no longer 
constant. We can proceed naively, and introduce the change of variables, 


»_ f % AW) _ ae BR) = Plye 
a= [oh GE =4O. PH=PWew, — (6.60) 
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transforming the nonlinear Langevin equation!, 


d 
ap = Aly) + ey)nlt), (6.61) 
to the Langevin equation described above, 
djs. 
77 = A(g) + n(2). (6.62) 


This equation, in turn, is equivalent to the Fokker-Planck equation, 


aP Opis POP 
= A(y)P 4 : ; 
ae ag P+ 3 age oe) 
In the original variables, 


OP.) = ; A(y) 4 seu)e(u) Plust) + 555 ["(y) Pty, 6) - 


(6.64) 


There is a problem — Eq. 6.61 as it stands has no meaning! The difficulty 
comes in trying to interpret the product c(y)n(t). Since n(t) causes a jump 
in y, what value of y should be used in c(y)? Specifically, how is one to 
interpret the [ c(y(t’))n(t’)dt’ term in the equivalent integral equation, 


t+dt nee 
yit+dt)—y(t) = f Atyte)ae +f ety(e) near? 
J t 
Stratonovich chose to replace c(y(t’)) by its mean value over the interval, 
giving, 
t+dt 


t+dt 
ee mee ih Aly(t!)) dt! 4 (ewer) | n(t!dt!. 


(6.65) 


One can prove that this interpretation indeed leads to Eq. 6.64. Other 
choices are possible, generating different transformation laws. The calcu- 
lus of stochastic functions is the focus of the next chapter, but first we 
shall turn attention to a classic application of the Fokker-Planck equation 
to the modeling of a physical system — Kramers escape over a potential 
barrier. 


1 Nonlinear Langevin equation is the designation usually given to any equation of 
the Langevin type where c(y) is non-constant, irrespective of the (non)linearity of 


A(y). 
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6.6 Limitations of the Kramers-Moyal Ex- 
pansion 


The Fokker-Planck equation, as derived in Chapter 6, remains the most 
popular approximation method to calculate a solution for the probability 
distribution of fluctuations in nonlinear systems. Typically, the Fokker- 
Planck equation is generated through a two-term Kramers-Moyal expan- 
sion of the master equation (see Eq. 6.11 on page 126). The result is 
a Fokker-Planck equation with nonlinear drift and diffusion coefficients. 
There are several problems with this approach — First, the expansion is 
not consistent since the step-operator is approximated by a continuous 
operator, but the transition probabilities remain as nonlinear functions of 
the microscopic variables. Second, the nonlinear Fokker-Planck equation 
has no general solution for state dimensions greater than one. That is, as 
soon as the system of interest has more than one variable, the nonlinear 
Fokker-Planck equation must be solved numerically or by some special- 
ized approximation methods. Since the nonlinear Fokker-Planck equation 
derived by the Kramers-Moyal expansion is consistent only insofar as it 
agrees with the linear Fokker-Planck equation derived using the linear 
noise approximation, and since the nonlinear Fokker-Planck equation is 
so difficult to solve, the Kramers-Moyal expansion seems an undesirable 
approximation method. 
For a more detailed discussion, see: 


e N. G. van Kampen (1981) “The validity of non-linear Langevin- 
equations,” Journal of Statistical Physics 25: 431. 


e N. G. van Kampen, “The diffusion approximation for Markov pro- 
cesses,” in Thermodynamics and kinetics of biological processes, Ed. 
I. Lamprecht and A. I. Zotin (Walter de Gruyter & Co., 1982). 


6.7 Example — Kramer’s Escape Problem 


e Kramers, HA (1940) “Brownian motion in a field of force and the diffusion 


model of chemical reactions,” Physica 7: 284. 


e D. ter Haar, Master of Modern Physics: The Scientific Contributions of H. A. 
Kramers. Princeton University Press, 1998. Chapter 6. 


e P. Hanggi, P. Talkner and M. Borkovec (1990) “Reaction-rate theory: 50 years 
after Kramers,” Reviews of Modern Physics 62: 251. 
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Figure 6.4: Kramers Escape from a Potential Well. A) Potential 
field with smooth barrier. Redrawn from Kramers (1940). B) The steady- 
state probability distribution is used, normalized as though the barrier at 
x = B were not there. The escape time is computed by calculating the 
rate at which particles pass X = B with positive velocity. 


Briefly, Kramers escape problem is the following: Consider a particle 
trapped in a potential well, subject to Brownian motion (Figure 6.4). 
What is the probability that the particle will escape over the barrier? 
Among the many applications of this model, Kramers suggested that the 
rates of chemical reactions could be understood via this mechanism. 

Imagine a Brownian particle subject to a position-dependent force 
F(a), in addition to the damping and fluctuations of Brownian motion, 


ax dX 
m => 


dt at 


1 F(X) + n(2), (6.66) 


Here, 7)(t) is the same Gaussian white-noise forcing that appears in Langevin’s 
equation. Suppose the force can be written as the gradient of a potential 
U(X) (as is always the case in one-dimension), 


PO) Se), 


Kramers had in mind a two-welled potential, as depicted in Figure 6.4a. 
To derive the associated Fokker-Planck equation, it is convenient to re- 
write Eq. 6.66 explicitly in terms of the position X and the velocity V, 


We are after a Fokker-Planck equation that describes the bivariate prob- 
ability distribution P(X, V,t). The coefficients for the drift are straight- 
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forward to calculate (see Section 6.3.1), 


A i tear as cae él v}. (6.67) 


At3o At Aiso At m m 


It is likewise straightforward to show that the diffusion coefficients (AX?) 
and (AX AV) vanish, 


.. (AX) _n  AAXAV) 
ay At aly ar ° ic 


For the (AV?)-term, we must appeal to statistical mechanics which says 
that the equilibrium distribution for the velocity is given by the Maxwell 
distribution, 


é m \1/2 M 15 
PV) = (a5) exp -seeV le (6.69) 
In that way, one can show (Exercise 9a), 
_ (AV?) kT 
ato At Tm mn 
The full Fokker-Planck equation is then, 
OP(X,V,t) oP U(X)OP yf a oP 
= P+kT 
at "Ox ae OY OV av? I 
(6.71) 


solved subject to the initial condition P(X, V,0) = 6(X — A)d(V —0). As 
a first estimate, we assume the well-depth W is large compared with kT. 
We are then justified in using the stationary distribution for P(X, V, t) 
around the point X = A, 


ae (6.72) 


U(X) +4V? 
PUY) Wo [ BODE) 
There is escape over the barrier, so obviously P*(X,V) is not correct for 
long times, but over short times, the probability is very small near the 
top of the barrier X = B (Figure 6.4b). We can then set P*(X’,V) = 0 


for X' > B. The normalization constant N is approximately, 


B 
Mets yl2n8t / eUAIKT gy pe, 2TRL vay (ar 
m Wa ’ 
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where w? = U"’(A) is the curvature at the base of the well, X = A. The 
escape rate comes from calculating the outward flow across the top of the 
barrier, at X = B, 


lo) co ; 
Z =fv P°(X,V)dV Newer fy er V7 /2kT aye — wa on WIRE 
: ) 20 

0 0 


(6.73) 


The underlying assumption is that if the particle is at the top of the 
barrier with positive velocity, then it will escape — as if there were an 
absorbing barrier at X = B. A heuristic interpretation of Eq. 6.73 is that 
the particle oscillates in a potential well $w?X? and therefore hits the 
wall w2/2m times per second, each time with a probability of BWI RE ta 
get across (Figure 6.4b). 

In a more sophisticated treatment, the Fokker-Planck Eq. 6.71 is 
solved with an initial distribution inside the well and the escape rate 
is determined by the flow past a point X = C' sufficiently far from the top 
of the barrier that the probability of return is negligible (Figure 6.4a). This 
is Kramer’s escape problem. There is no known solution for a potential 
well of the shape drawn in Figure 6.4a, so clever approximation methods 
are the only recourse for estimating the escape time. In Exercise 11, an 
approximate solution is developed in the limit of large friction, y > 1. 


Suggested References 


The encyclopedic reference for applications of the Fokker-Planck equation 
along with an exhaustive detailing of existing solution methods is the 
following text by Risken: 


e The Fokker-Planck Equation: Methods of Solutions and Applica- 
tions (2nd Ed.), H. Risken (Springer, 1996). 


For all kinds of first-passage and escape problems, 


e A Guide to First-Passage Processes, S. Redner (Cambridge Univer- 
sity Press, 2001). 


is recommended. 
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Exercises 


1. 


One-step process: Fill in the details in the derivation of Eq. 6.15 
from Eq. 6.14. 


. Autocorrelation of the Wiener process. Fill in the details 


leading to Eq. 6.32. It may be helpful to write the Dirac delta 
function as the derivative of the unit-step function, and integrate 
by parts. 


. Deterministic equations: Starting from the Fokker-Planck equa- 


tion, 


Of(x,t) a dl: OF 
- ae [ai (a, t) f(a, t)] + 3 Ox [a2 (a, t) f(a, t)], 


(a) Derive the deterministic (macroscopic) rate equations for (X (t)) 
and (X?(t)). 

(b) If ay(x) = a(x) + 404(x) and ag(x) = b?(x), derive 
Of (x, t) 0 1 0 { 0 \ 


AD = 9g O@MF@A] +5 By 4) Zz WO F@.4)] 


. Deterministic dynamics as a Markov process: Consider the 


deterministic differential equation dy/dt = f(y). 


(a) Show that the process y(t) is a Markov process. 

(b) Derive the associated Fokker-Planck equation (called the Liou- 
ville equation). Show that if y(t) satisfies the Liouville equa- 
tion, then it likewise satisfies the original differential equation. 


. Stratonovich Fokker-Planck equation. Fill in the missing de- 


tails that lead from the change of variables, Eq. 6.60, to the Fokker- 
Planck equation, Eq. 6.64. 


. Diffusion in a wedge: For a Brownian particle diffusing in an 


infinite wedge with absorbing edges (Figure 6.5), compute the prob- 
ability that a particle starting from (79,00) (where 60 is the angle 
with the horizontal) is absorbed by the horizontal edge of the wedge. 


. Langevin equation with nonlinear drift: In Eq. 6.59, show that 


the last term, 


t+At t+At 
2A! (y(t)) | ( | ((y(t’) — y(e)] nea dt 
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Figure 6.5: Diffusion in a wedge. An infinite wedge forming angle 0 
with the horizontal. The edges are absorbing. What is the probability 
that a Brownian particle beginning at (1,9) is absorbed by the horizon- 
tal edge? 


is indeed O(At?) as claimed in the text. 


8. First-passage time: Consider the diffusion process X(t) in one- 
dimension on a finite-interval (0, L). 


(a) Solve the Fokker-Planck equation for the process, 


SS 


ha 


Of (a, t) D 0 f (a, t) 


Ox Ox? 


for t > 0, on the interval 0 < x < L, with the initial condi- 
tion f(z,0) = 6(a@ — 2) (where wp € (0,L)), with reflecting 
boundary conditions at « = 0 and x = L (no-flux boundary 
conditions). Hint: Your answer will be given in terms of a 
cosine Fourier series. 


What is the stationary probability density function for the pro- 
cess X(t)? 

For absorbing boundary conditions (f(0,t) = f(L,t) = 0), the 
trapping-probability T;(t) is the flux of probability through the 
boundary point at 7. Find the splitting probability S; that a 
particle beginning at xo will be eventually trapped at boundary 
point 7. Find the average lifetime of the particle starting at 
x = Xo before it is trapped at x = 0, x = L, or unconditionally 
trapped at either boundary. 


Show that for L - oo, the probability distribution for diffusion 
on the positive half of the x-axis, with reflecting boundary 
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conditions at x = 0, can be written as 


f( t) [ eos)" + ets0)" 
t= e€ e 

V4ArDt 
If the boundary condition is absorbing at x« = 0, how is the 
solution changed? Provide a physical interpretation of the dis- 
tribution with absorbing conditions at x = 0. 


(e) In the semi-infinite domain (L — oo), suppose the x = 0 
boundary is absorbing. Show that for a particle starting at 
x = xo trapping at x = 0 is certain, but that the average life- 
time is infinite! This is an example of Zermelo’s paradox (see 
p. 67). 


9. Rayleigh particle: In 1891 (some 15 years before Einstein’s work), 
Lord Rayleigh published his study of the motion of a particle buf- 
feted by collisions with the surrounding gas molecules. The motion 
is studied on a timescale over which the velocity relaxes (a finer time 
scale than Einstein’s study of Brownian motion). In one dimension, 
the macroscopic equation for the velocity is given by the damping 
law, 


dV 
—_— =—V. .74 
; (6.74) 


(a) Derive the Fokker-Planck equation governing the distribution 
for the velocity P(V,t). The drift coefficient is easily found, 


From statistical mechanics, the equilibrium distribution for the 
velocity is given by the Maxwell distribution, 


P*(V) = (==) ae |-arV"| (6.75) 
Use this to derive the diffusion coefficient, and thereby the full 
Fokker-Planck equation governing P(V,t). 

(b) For the initial condition V(0) = Vo, find (V(t)) and 
(VE)V (E+ 7))). 


10. Brownian motion in a magnetic field: A Brownian particle 
with mass m and charge q moves in three-dimensional space spanned 
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by a rectangular coordinate system «yz under the influence of a 
constant, homogeneous magnetic field B directed along the z-axis, 
B = (0,0,B). The velocity vector V(t) = (Ve(t), Vy(t), Ve(t))” 
satisfies the 3-component Langevin equation, 


m — = —6V(t) — qB x V(t) + /2BkaT T(t), (6.76) 
where {3 is the friction coefficient, and I'(t) = (T(t), Py (t), TQ)" 
is a vector of Gaussian white noise with independent components. 


(a) Assuming over-damped motion, neglect the inertial term in 
Eq. 6.76 and derive the Langevin equation for the position 
R(t) = (X(t), Y(t), Z(t)” in matrix form, 


dR <5 
— =C-Tit), (6.77) 


where C is a 3 x 3 matrix. 

Derive from Eq. 6.77 the coefficients of the Fokker-Planck equa- 
tion governing the joint probability distribution f(z, y, z,t) for 
the position process R(t). 

Solve the Fokker-Planck equation in 10b subject to the initial 
condition f(x,y, z,0) = 6(x)d(y)6(z) by using a 3-component 
Fourier transform. Show that the components X(t) and Y(t) 
are independent Wiener processes, each with zero mean and 
variance 2D gt, where Dg = D/ [1 + (qB/3)"| and D = kgT/8. 
What can you say about the motion Z(t)? 


rei 
= 


— 
io) 
YN 


Large damping (y > 1) limit of Kramers Fokker-Planck 
equation: For large damping, the velocity relaxes very quickly, and 
so we would like an equation that governs the marginal distribution 
for the position X alone, P(X, t). 


(a) In Eq. 6.71, re-scale the variables, 


X=a2VkT/m, V =vVkT/m, (X) = f(a) VkT/m, 


(6.78) 


to derive a re-scaled Fokker-Planck equation. 
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(b) We seek a perturbation expansion of P(x,v,t) in powers of 
y—!. To that end, make the ansatz, 


P(a,v,t) = PO (a, v,t) + y71P™ (a, v, t) + 7? PO) (x, v, t) +... 


Substitute into the re-scaled Fokker-Planck equation, and find 
P(a,v,t) up to O(y~?). 

(c) Integrate P(x,v,t) over v to obtain the marginal distribution 
P(a,t) and show that it obeys the Fokker-Planck equation, 


OP(z,t) _ _s1f 4d pe - 2 
at ry ($ #e@yP- 55) =00 i 


or, in the original variables, 


OP(X,t)_- 9-_-F(X) kT OP 
fe Ok Hay Py Oe 


P(X,t). (6.79) 


CHAPTER / 
a | 


STOCHASTIC ANALYSIS 


7.1 Limits 


We return now to Doob’s criticism of Ornstein and Uhlenbeck’s study of 
Brownian motion (cf. page 20), 


The purpose of this paper is to apply the methods and results 
of modern probability theory to the analysis of the Ornstein- 
Uhlenbeck distribution, its properties, and its derivation... 
A stochastic differential equation is introduced in a rigorous 
fashion to give a precise meaning to the Langevin equation for 
the velocity function. This will avoid the usually embarrassing 
situation in which the Langevin equation, involving the second- 
derivative of x, is used to find a solution x(t) not having a 
second-derivative. 


—J. L. Doob (1942) Annals of Math. 43:351. 


To understand Doob’s objection, we must extend the definition of the 
derivative to include random functions. This definition, in turn, requires 
a definition for the limit of a sequence of random variables, of which 
there are several. We choose to use the mean-square limit here because 
of the close association between limits defined in this way and limits from 
ordinary calculus. For more details, M. Loeéve “Probability theory,” Van 
Nostrand (1966) is recommended. 
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Definition: The random variable € is the limit of the sequence 
of random variables {£, &,...,€n}, if 


lim (I€ - nl?) = 0; (7.1) 


i.e. if for any € > 0, there exists an N = N(e) such that for all 
n>WN, (lg - alt) <e. A limit defined in this way is usually 


called a mean-square limit (or a limit in the mean-square) and 
{&,} is said to converge to € in the mean-square. 


Using Chebyshev’s inequality, 


2 
(lg - él”) 
P{lGe— | ths ag 
it is straightforward to verify that Eq. 7.1 implies that the sequence also 
converges in probability, 


Jim P {és —€| > ¢} =0. 


This means, in turn, that given any « > 0 and 7 > 0, there exists an 
No = No(e,7) such that for all n > No, 


P{l&—é|<e} >1—n 


(If € is the limit in the mean-square, then it is also the limit in probabil- 
ity.) Using these definitions, the statement of the ergodic theorem from 
Chapter 2 can be made more precise. We will not do so here; instead, we 
shall discuss some conditions which have to be imposed on a stationary 
stochastic process to guarantee the validity of the ergodic theorem. 

One question that arises is: Under what conditions do we have, 


m = (€(t)) = lim nfo t)dt = lim 5 (7.2) 


F300 T 


This question was answered by Slutski (1938) (see Doob’s “Stochastic 
Processes,” Wiley (1960) for more details): 
Consider the centered correlation function, 


C(r) = ([E (+7) — mJ [E ¢) — m)) = B(r) — m’, 
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then Eq. 7.2 holds if, and only if, 


T 

aL 

jim 5 / CANaGret: (7.3) 
0 

where the limits are defined as above. 

In practice, the function C(t) + 0 as T > oo since the dependence 
between €(¢ + 7) and &(t) usually weakens as r — 0. Of course, in this 
case Eq. 7.3 is satisfied. Another case of practical importance occurs 
when C(r) = (function > 0 as T + oo) + (periodic terms), as when &(t) 
contains a purely harmonic component of the type € = £9e*“", where € is 
a random variable and w is a real number. Then, too, Eq. 7.3 is satisfied. 


7.2  Mean-square Continuity 


Throughout this chapter, we shall modify the standard definitions of cal- 
culus (continuity, differentiability and integrability) to include stochastic 
processes. The idea of convergence in the mean-square will be crucial. 
Likewise, we shall find that wherever mean-squared convergence appears, 
reformulation of the definition in terms of the autocorrelation is natural. 
We begin with the definition of mean-squared continuity. 


Definition: A process €(t) is mean-square continuous at t if, 
and only if, the limit 


lim (E(t) — &(¢— h))?) = 0, (7.4) 


h-0 


exists. 


The formal definition can be recast in a more useful form involving 
the correlation function B(t1,t2) = (€(t1)€(t2)). The above limit exists 
if, and only if, B(t1, t2) is continuous in ¢; and tz at the point t) = to = t; 
that is, the limit 


lim [B(t—h,,t — he) — Bit, t)] =0, (7.5) 
must exist. Notice this is not the same as setting hy = hg = h and taking 
the limit as h > 0. If €(t) is wide-sense stationary, then B(t,t—T) = B(r) 


is a function of the time-difference only and the condition for continuity 
is satisfied if, and only if, B(r) is continuous at + = 0; that is, the limit 


lim [B(h) — B(0)] = 9, (7.6) 
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must exist. From Eq. 7.6 it should be clear that if a wide-sense stationary 
process is continuous at one time t, it is continuous for all time. 


7.3 Stochastic Differentiation 


The derivative of a stochastic process is defined in a formal way: 


Definition: A process €(t) is mean-square differentiable at 
t if, and only if, there exists a random variable, denoted by 
é’(t), such that the limit 


E(tt+h)—€(t) 4 ,\" 
sn (( ) ew) }=0 (7.7) 


The definition, as usual, is difficult to use in practice. As a more 
useful corollary, one can show that Eq. 7.7 is satisfied if, and only if, the 
autocorrelation function B(t,,t2) = (€(t1)€(t2)) is differentiable at the 
point t = t; = tg, «.e. the limit 

li : 

hi, Fo40 hy ho 


exists. 


[B(t — hi,t — ha) — B(t,t — hg) — B(t — in, t) + Bt, d)), 
(7.8) 


must exist. If the process €(t) is wide-sense stationary, then B(t,t — 
T) = B(r) is a function of the time-difference only and the condition for 
differentiability is simplified: The limit 


ome : 
lim yy [B(h) — 2B(0) + B(-A)], (7.9) 


must exist. Moreover, the autocorrelation function of the derivative €’(t) 
is given by, 


Bey (t1, t2) = ga 


or, for a stationary process, 


d2 
Bet) (7) => — Gp BCT). 


With these conditions in hand, it is straightforward to show that the 
Wiener process and the Ornstein-Uhlenbeck process are not differentiable 
(precisely Doob’s objection), but we shall also show that this does not 
matter in the least for the modeling of physical systems. 
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7.3.1 Wiener process is not differentiable 


Recall from Chapter 6 (cf. Eq. 6.31 on page 134) that in the absence of 
a damping force, the Langevin equation reduces to the equation charac- 
terizing the Wiener process W(t), 


dw 
Sete py. 
Ai n(t) 


We will now show that W(t) is not differentiable — and so the equation 
above, as it stands, is meaningless. In Chapter 6 (Eq. 6.32 on page 135), 
we derived the autocorrelation function for the Wiener process, 


(W(t,)W (t2)) = min(ty, te). 


Clearly, the autocorrelation function indicates that the process W(t) is 
nonstationary, so we must use Eq. 7.8 to check its differentiability: 


{min(t — hy,t — hg) — (t— ha) —(t-— hi) +1] 


lim  — 
hi, h2-0 hyho 


_ nti 0 ihe [min(t — h1,t — ho) +ho+hi -¢]. 
The limit does not exist, and we must conclude that the Wiener process 
is not differentiable (although one can easily show that it is mean-square 
continuous). 

The Ornstein-Uhlenbeck process is stationary at steady-state, and the 
autocorrelation function, 


ec ltl/te 


B(r) = 
Te 
is a function of the time-difference 7 — We can therefore use Eq. 7.9 to 
check its differentiability. One finds that the Ornstein-Uhlenbeck process 
is likewise not differentiable (Excercise 3a). That was Doob’s point in his 
criticism of the work of Ornstein and Uhlenbeck (quoted on page 20). 


BUT (and this is critically important) 


We shall now show that if the forcing function F(t) is not strictly delta- 
correlated, if the process has a non-zero correlation time (however small), 
then the differential equation, 

dy 


apr + F(t), (7.10) 
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is well-defined and y(t) is differentiable. For sake of example, suppose 
F(t) is an Ornstein-Uhlenbeck process. As shown above, the steady-state 
correlation function for the Ornstein-Uhlenbeck process is the exponen- 
tial, 


—|7|/T. 
eT! T/T. 


where we have written the correlation time explicitly, and made the pre- 
factor proportional to 1/7. to clarify the correspondence between the 
present example and the Wiener process studied above. Notice, in the 
limit of vanishing correlation time, 


1 
i —|t|/Te 
nes or, e€ = d(T). 


From Chapter 2 (Eq. 2.19 on page 42), the spectral density of the 
process y(t) (as characterized by Eq. 7.10) is simply, 


ih (1/Te)? 
= 11 
Syy (w) Ww2 an 72 w2 +4 (1/tT.)? 5) (7 ) 

from which the autocorrelation function follows, 

T eww yTee7|TI/Te 
B = : 12 
Taking the limit (cf. Eq. 7.9), 
B,(h) —2B B,(-h 1 Tr 
h>0 h2 2To 1 + VTe 


The limit exists, so we conclude that the process y(t) defined by the dif- 
ferential equation (7.10) is differentiable. Obviously, as rT. — 0, the limit 
above becomes undefined, but for any non-zero correlation time, however 
small, the derivative of y(t) is well-defined and can be manipulated using 
the rules of ordinary calculus. Incidently, Ornstein and Uhlenbeck never 
explicitly state that their forcing function is delta-correlated, merely that 
it is very narrow (G. E. Uhlenbeck and L. S. Ornstein (1930) Physical 
Review 36: 823-841): 


[W]e will naturally make the following assumptions. .. There 
will be correlation between the values of [the random forcing 
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function A(¢t)] at different times t; and tz only when |t — to| 
is very small. More explicitly we shall suppose that: 


(A(ti)A(t2)) = $(t1 — te), 
where $(x) is a function with a very sharp maximum at x = 0. 


As shown above, under these assumptions the derivation of Ornstein and 
Uhlenbeck is perfectly correct and no questions of inconsistency arise. 
This led Wang and Uhlenbeck to write as a footnote in a later publication 
(M. C. Wang and G. E. Uhlenbeck (1945) Reviews of Modern Physics 17: 
323-342), 


The authors are aware of the fact that in the mathematical 
literature (especially in papers by N. Wiener, J. L. Doob, and 
others; cf. for instance Doob, Ann. Math. 43, 351 (1942), 
also for further references) the notion of a random (or stochas- 
tic) process has been defined in a much more refined way. This 
allows for instance to determine in certain cases the probabil- 
ity that a random function y(t) is of bounded variation, or 
continuous or differentiable, etc. However, it seems to us that 
these investigations have not helped in the solution of prob- 
lems of direct physical interest, and we will, therefore, not try 
to give an account of them. 


7.4 Stochastic Integration 


Similar ideas can be used to define the integral of a stochastic process 
E(t): 


Definition: A process €(t) is mean-square integrable on the 
interval (0,t) if, and only if, there exists a random variable, 
denoted by €(-))(#) or f% (u)du, such that the limit 


Lt/el t : 
in ( € a E(ie) — [sau ) =0, (7.14) 
i=0 0 


exists (where the floor function |-| rounds down to the nearest 
integer). 


As above with differentiation, the definition can be recast as a condi- 
tion on the correlation function B(t,,t2). Specifically, the limit Eq. 7.14 
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exists if and only if B(t,,t2) is Riemann-integrable on the square (0, t) x 
(0,t); that is, the limit 


[t/e] 


t t 
J [ Beestadtiate = tim eS” Blie,je)| , (715) 
0 Jo e—0 


i,j=0 


must exist. If the process €(t) is wide-sense stationary, then the corre- 
sponding limit for the stationary correlation function must exist, 


Lt/e] 
[a dr = lim e S° Blie)| . (7.16) 


i=0 


For a test function f(t), a necessary condition for 


b 
/ F(t) E (t) dt, (7.17) 


to be integrable in the mean-squared sense is that €(t) is integrable; or, 
if the integral is defined in the Lebesgue-Stieltjes sense, then measure 
€ (t) dt must be of bounded variation. That means, very roughly, that 
the infinitesimal € (t) dé must not be infinitely large in the mean-squared 
sense. 

Recall that the correspondence between the Langevin equation and the 
Fokker-Planck equation is unambiguous, unless the white-noise forcing is 
multiplied by a nonlinear function of the state variable,! 


dy 

ae Aw) + cnt). 

The question comes down to how one defines the terms in the equivalent 
integrated equation, 


fave [awar+ fotomtear (7.18) 


In particular, what precisely is the meaning of the last term, | c(y)n(t)dt? 
Notice that because the correlation function for white noise is delta- 
correlated, i.e., B(t1,t2) = 6(ti — tz), this process is not integrable. There- 
fore, [ c(y)n(t)dt is not integrable in the ordinary sense and some inter- 
pretation must be provided, beyond the ordinary rules of calculus. There 


1For convenience, we call any Langevin equation (irrespective of whether A(y) is 
nonlinear) a nonlinear Langevin equation if the forcing function c(y) is a nonconstant 
function of the state variable y(t). 
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are two interpretations that dominate the literature: the Stratonovich in- 
terpretation and the It6 interpretation. It is useful to compare the two 
viewpoints, contrasting how they were developed, and most importantly, 
how they come in to solving actual problems. 


7.4.1 It6 and Stratonovich Integration 


First, some notation. As we showed in Section 7.3, the Langevin equation 
is not well-defined since the process it characterizes is non-differentiable. 
The problem comes from the singularity of the forcing function’s correla- 
tion (n(t1)n(t2)) = 6(t; — tz).? Nevertheless, the integrated form of the 
equation is well-defined, once an interpretation rule has been attached. It 
is therefore more correct mathematically to write the Langevin equation 
as, 


dy = A(y)dt + c(y)n(t)dt. (7.19) 


Furthermore, recall from Section 6.4 on page 133 that a purely diffusive 
process (i.e. a Langevin equation without damping), is called Brownian 
motion or a Wiener process W(t), 


dw 
ae. n(t). 


In the integrated form, 
dW (t) = n(t)dt. 


Very often, Eq. 7.19 is written using the increment of the Wiener process, 
thus, 


dy = A(y)dt + c(y)dW(t). (7.20) 


It should be emphasized once again that Eqs. 7.19 and 7.20 have no 
meaning until an interpretation rule is provided for f c(y(t’))n(t')dt! = 
J c(y(t’))dW(t'). The two dominant interpretations are discussed below. 


1. Stratonovich interpretation. Stratonovich viewed the correla- 
tion (n(t1)n(t2)) = (ti — te) as an idealization of a narrowly peaked 
correlation function. As we saw in Section 7.3, so long as the corre- 
lation time 7, is non zero, the correlation function is non-singular, 


2For simplicity, we set the variance of the white noise to 1, i.e., l= 1. 
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and the Langevin equation can be manipulated using ordinary cal- 
culus. In particular, with the change of variables (cf. Eq. 6.60 on 
page 141), 


~_ f % AW) _ ay Brg) = Ply ec 
d= [oh FE =4O. PO=PWew, — 722) 


Stratonovich arrived at the following correspondence between the 
Langevin and Fokker-Planck equations, 


dy = A(y)dt + c(y)dW(t) (STRATONOVICH) (7.22) 
$ 
SEBO =F | aly) + Selle] Put) + 55 WPA). 


Typically, the function c(y) 4 0 for any y — if it does, more so- 
phisticated methods are required to patch together solutions in the 
vicinity of the singular point. The above amounts to setting c(y) 
equal to its mean value over the infinitesimal range of integration 
in Eq. 7.20, 


t+dt t+dt 


(evade = jf Auteya | (Oe) i n(t!dt’ 


t t 


(7.23) 


2. It6 interpretation. It6 began from an entirely different viewpoint. 
In the same way that Lebesgue developed the ideas of set theory 
to provide a more general definition of Riemann’s integral, It6 ex- 
tended the ideas of Lebesgue to include integration with the Wiener 
process as the weighting function dW (t’). One can show that dW (t) 
has unbounded variation which means, roughly speaking, that the 
mean-squared length of the infinitesimal dW(t) is infinite — so It6’s 
taming of the wild Wiener process into a consistent definition of 
stochastic integration is certainly a mathematical tour-de-force. His 
development leads to the correspondence, 


dy = A(y)dt + c(y)dW(t) (ITO) (7.24) 
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This amounts to setting c(y) equal to its initial value over the in- 
finitesimal range of integration in Eq. 7.20, 


t+dt t+dt 


etal n= fa Aly(t’))dt! + ¢(y(t)) / n(t’)dt!. (7.25) 


t 


It is important to note that in contrast to Stratonovich’s interpre- 
tation, the It6 interpretation cannot be formulated unless the noise 
is strictly delta-correlated. Transformation of variables under It6’s 
calculus are not the same as ordinary calculus, and this is often 
useful in proving various properties of the solution of Eq. 7.24 (see 
Appendix C). 


Notice that the Fokker-Planck equation does not require an interpre- 
tation rule — it is simply a partial differential equation governing the prob- 
ability density. Furthermore, the Stratonovich and It6 interpretations can 
be made to generate the same form for the Fokker-Planck equation at the 
expense of re-defining the drift A(y), 


(STRATONOVICH) A(y) = [A(y) — se(y)e'(y)] Aly) = Aly) (ITO). 


Obviously, if c(y) is constant (c(y) = 0), then the two interpretations 
coincide. A great deal is made about the distinction between Stratonovich 
and Ito calculus in some circles, but in the end, what it all comes down 
to is that no natural process is described by white noise, so the nonlinear 
Langevin equation is, by construction, a gross simplification of whatever 
process it is meant to represent. In the words of van Kampen ((1981) 
Journal of Statistical Physics 24: 175-187), 


The final conclusion is that a physicist cannot go wrong by 
regarding the Ito interpretation as one of those vagaries of the 
mathematical mind that are of no concern to him. It merely 
served to point out that [the nonlinear Langevin equation] is 
not a meaningful equation, and thereby warn him against glib 
applications of the Langevin approach to nonlinear equations. 


Nevertheless, white noise is a useful construction in the development of ap- 
proximation methods, particularly in the numerical simulation of stochas- 
tic differential equations (see Section 8.1). For further details about It6’s 
calculus, Gardiner’s “Handbook of stochastic methods” is recommended. 
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Afterword 

As mentioned above, dW (t) has unbounded variation so that the mean- 
squared length of the infinitesimal dW(t) is infinite! That is clearly 
problematic, and leads to f c(y(t’))dW(t’) being undefined in any ordi- 
nary sense. Nevertheless, It6 did for Lebesgue integration what Lebesgue 
did for Riemann integration; namely, by providing a foundation built 
upon the very deep ideas of set theory, pathological integrals such as 
J c(y(t’))dW(t') are endowed with useful and self-consistent meaning. 
Furthermore, using It6’s calculus, various transformations and theorems 
have been established that greatly simplify the analysis of nonlinear Lan- 
gevin equations. Said another way, It6’s calculus is a useful construction 
of pure mathematics that streamlines the proof of theorems using the 
Wiener process as a canonical noise source. This streamlining is so sig- 
nificant that often white noise sources are used to expedite the analysis 
of various models. 

From a physical point of view, however, white noise does not exist! 
Any equation resembling the nonlinear Langevin equation is necessarily 
an approzimation of dynamics forced by a noise source with very nar- 
row (though nonzero) correlation time T.. The formal limit of vanishing 
correlation time tT, from whence white noise in the Stratonovich sense is 
defined, 


’ 1 
eye PE el), 


is best thought of as an asymptotic limit because our Markov assumption 
relies upon a full relaxation of the microscopic variables over time scales 
much shorter than 7, (see Section 3.4 on page 59). In summary, It6’s 
calculus is lovely mathematics and can sometimes provide a short-cut to 
lengthy computations, but it is a drastic approximation of real processes 
and therefore the physical relevance of results derived in this fashion must 
be viewed with skepticism. 


Suggested References 


Much of the early content of this chapter comes from 


e Introduction to random processes (2nd Ed.), W. A. Gardner (McGraw- 
Hill, 1990), 


which is an excellent text on stochastic signal analysis directed toward an 
electrical engineering audience. 
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Gardiner’s Handbook has a large collection of examples illustrating the 
range and applicability of Ito calculus, 


e Handbook of stochastic methods (3rd Ed.), C. W. Gardiner (Springer, 
2004), 


and devotes several pages to the derivations outlined in this chapter. 
N. G. van Kampen’s master work, 


e Stochastic processes in physics and chemistry (2nd Ed.), N. G. van 
Kampen (North-Holland, 2001), 


spares no punches in a characteristic savaging of It6 calculus as it relates 
to the modeling of physical processes (see p. 232-237 in that book). For 
a more self-contained and detailed version of his argument, 


e N. G. van Kampen (1981) “The validity of non-linear Langevin- 
equations,” Journal of Statistical Physics 25: 431-442, 


is highly recommended, along with 


e N. G. van Kampen (1981) “It6 versus Stratonovich,” Journal of 
Statistical Physics 24: 175-187. 


Exercises 


1. Wide-sense stationary processes: For wide-sense stationary 
processes, the fundamental definitions of continuity, differentiabil- 
ity and integrability were re-cast in terms of the correlation function 
B(r). 

(a) Show that Eq. 7.6 follows from the definition of continuity for 
a wide-sense stationary process. 


(b) Show that Eq. 7.9 follows from the definition of differentiability 
for a wide-sense stationary process. 


(c) Show that Eq. 7.16 follows from the definition of integrability 
for a wide-sense stationary process. 

(d) Show that mean-squared differentiability implies mean-squared 
continuity. 


2. Orthogonality of a wide-sense stationary process: For the 
wide-sense stationary process €(t), show that 


(e( #0) 20 
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3. Non-differentiability of the Ornstein-Uhlenbeck process. 


(a) Show that the Ornstein-Uhlenbeck process is not differentiable. 


(b) Explain, in words, why the damped differential equation for 
y(t) (Eq. 7.10), 


dy 
a ee F(t), 


forced with a nondifferentiable Ornstein-Uhlenbeck process F(t) 
still results in y(t) being differentiable. 


4. Moments of an integrated process: Show that for the integrable 
wide-sense stationary process €(t), the integrated process Y(t) = 
fo €(u)du has mean and variance given by, 


(v@ - (v()))?) =e (1- a) Ce(%tau)dr, 


where Ce(r) = (E(t)&(t — )) — (€)?. 
5. It6 and Stratonovich interpretations: 
(a) Show that with Itd’s interpretation, (c(y)n(t)) = 0, while the 
Stratonovich interpretation gives (c(y)n(t)) = 4 (c’(y) e(y)). 
(b) For the following Langevin equation, 


d. 
Tene), 2(0) = 20, (7.26) 
dt 

where 7)(t) is Gaussian white noise, what is (a(t)) with Eq. 7.26, 


i. interpreted in the Statonovich sense. 
ii. interpreted in the It6 sense. 


CHAPTER 8 
BT 
| 


RANDOM DIFFERENTIAL EQUATIONS 


The ‘repeated-randomness assumption’ (or the Stosszahlansatz, section 3.4 
on page 59) allows many degrees of microscopic freedom in the model dy- 
namics to be eliminated, leaving behind a Markov process governed by a 
master equation. In some case, this coarse-graining is continued, and it is 
possible to identify a subset of variables whose stochastic properties are 
not influenced by the variables of interest. In the extreme case, all of the 
stochastic character comes from these external variables, and the mas- 
ter equation reduces to a differential equation with random coefficients 
(a random differential equation). Noise of this type is usefully denoted 
‘extrinsic’ or ‘external’ noise (in contrast with ‘intrinsic’ noise where the 
fluctuations are inherent to the dynamics, and the master equation cannot 
be further reduced). Examples of extrinsic noise are Brownian motion, 
where the motion of the solvent molecules give rise to a random forcing 
of the solute particle, although their motion is not affected by the solute. 
Another example is an electromagnetic wave passing through a turbulent 
troposphere — the wave obeys Maxwell’s equations with a random dielec- 
tric constant (leading to a random coefficient in the differential equation), 
justified by the observation that the passing wave has negligible effect on 
the turbulent atmosphere. 

In much of the current literature, the term stochastic differential equa- 
tion is used almost exclusively to denote It6’s version of the nonlinear 
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Langevin equation, 
dy(t) = A(y)dt + c(y)dW (t), 


where dW (t) = 7(t)dt is the Wiener process coming from a Gaussian white 
noise source 7(t). When the random force d&(t) is not white noise, the 
problem of interpretation (It6 versus Stratonovich) does not occur because 
J c(y)dé(t) is well-defined as an ordinary (Lebesgue-Stieltjes) integral. 

If one’s focus is upon problems arising from physical systems (i.e. 
problems in science and engineering), the emphasis on Ité’s equation and 
Ité calculus is misdirected because it is the fact that dW(t) comes from a 
white noise source that creates all the problems. In a manner of speaking, 
these difficulties are artificial and disappear when one takes into account 
that a random force in physics is never really white noise, but has (at 
best) a very short correlation time. Once that is accepted, one also gets 
rid of Doob’s objection arising from the non-existence of the derivative 
in the Langevin equation, since a non-white noise forcing does not affect 
the differentiability of the process. Consequently, stochastic differential 
equations can be formally written as ordinary differential equations, 


du 

where u and F’ may be vectors, and Y(t) is a random function whose 

stochastic properties are given and whose correlation time is non-zero. 
Roughly speaking, random differential equations can be subdivided 

into several classes, 


1. linear random differential equations where only the forc- 
ing term is a random function, as in the Langevin equa- 
tion (additive noise). 

2. linear random differential equations where one or more 
of the coefficients are random functions (multiplicative 
noise). 

3. nonlinear random differential equations. 


4. other variations, random partial differential equations, 
etc. 


Additive noise was dealt with in great detail in Chapter 1. Multiplicative 
noise will be dealt with in the following sections. Nonlinear stochastic 
differential equations and partial differential equations are more advanced 
than the level of this course. 
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8.1 Numerical Simulation 


e C. W. Gardiner, Handbook of stochastic methods (8rd Ed.), (Springer, 2004). 


e D.T. Gillespie, Markov processes: An introduction for physical scientists, (Aca- 
demic Press, 1992). 


e D.S. Lemons, An introduction to stochastic processes in physics, (John Hopkins 
University Press, 2002). 


The typical situation in physics is that the physical world, or at least 
some large system, is subdivided into a subsystem and its environment 
(see Section 3.4 on page 59). The influence of the environment on a 
subsystem is treated like a heat bath, viz. as a random force whose 
stochastic properties are given (or assumed). Hence the equations of 
motion of the total system are reduced to those of the subsystem at the 
expense of introducing random coefficients. 

The system described by Eq. 8.1 determines for each particular real- 
ization y(t) of the random function Y(t) a functional U([y],t, uo) which 
depends upon ail values y(t’) for 0 < t/ < t. The ensemble of solu- 
tions U([y],t, uo) for all possible realizations y(t’) constitutes a stochastic 
process. The random differential equation is solved when the stochastic 
properties of this process have been found — this can rarely be done in 
practice, and so approximation methods are necessary. 

As an example of the treatment of random differential equations, we 
concentrate on the class of multiplicative noise processes, 7.e. those ran- 
dom differential equations with random coefficients. Even for this compar- 
atively simple class, the theory is very incomplete and a general solution 
is out of the question. Instead, we focus upon the first two moments, 
and specifically on the behaviour of the average (u(t)). In the following 
sections, we briefly discuss how numerical solutions of random differential 
equations should be computed, then turn attention to analytic approxi- 
mation schemes for generating closed equations for the moments of u(t). 


White noise 


Numerical simulation of random differential equations begins with a dis- 
cretization of the Langevin equation, 


dy = Aly, t)dt + cly, t)n(t)dt, 


where 7(t) is Gaussian white noise. One can show that the integrated 
process dy (interpreted in the Jt6 sense) is equivalent to the equation 
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1. Initialize: t — to, y — yp. 


v 


;-> 2. Choose a suitably small At>0.4 


; | 3. Draw a sample value n of the unit 
normal random variable N(0,1).° 


| 
4.Advance the process: 
*"y— y+ nc(y,t) [At]'? + A(y,t)At 
et<— t +Atc 

| 


5. Record y(t)=y as required for 
sampling or plotting. If the process is 
to continue, then return to 2 or 3;4 
otherwise, stop. 


Figure 8.1: Simulation of a trajectory drawn from p(y, t|yo, to) 
satisfying the Fokker-Planck equation Op/0t = —0,[A(y,t)p] + 
s OF[c*(y, t)pl. a) Should satisfy the first-order accuracy condition 
(Eq. 8.5) for some ¢, < 1. Optionally, may satisfy the plotting con- 
dition (Eq. 8.6) for some €2 < 1. b) Use a suitable method to gen- 
erate the normally distributed random number (see Excercise 3). c) If 
(tmax — to)/At ~ 10%, then the sum t+ At should be computed with at 
least K + 3 digits of precision. d) The value of At may be reset at the 
beginning of any cycle. Taken from Gillespie (1992), p. 194. 
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(Exercise 2a), 
dy = A(y,t)dt + N(0,c?(y, t)dt), (8.2) 


where N(y,07) is a Gaussian (normal) distribution with mean pz and 
variance a? (see Section A.2 on page 265). From the properties of the 
Gaussian distribution, we can re-write Eq. 8.2 as, 


dy = A(y,t)dt + n- c(y,t)Vdt, (8.3) 


where now n is a number drawn from a unit normal distribution N(0, 1). 
Several remarks are in order— 


1. For the Wiener process (A = 0, D = 1), Eq. 8.3 reduces to, 
dW (t) = n- dt!/?, 


a relation that is exploited in many theorems using It6’s calculus 
(see Gardiner (2004) and Appendix C for examples). 


2. The Wiener process is continuous, so 
W(t+ At) W(t) = 
At At 
we + At) w(t ae )| lw (++ >) = wee]. 


From Eq. 8.3, 


/ At / At 
V AtN!+“*(0, 1) ow; > Nikat/20, 1) + SNA, 1). 


The sub- and super-scripts of N¢** indicate that the random vari- 
able is explicitly associated with the interval (t,t+At). This relation 
is only satisfied if the normal distributions are independent of one 
another for non-overlapping intervals. We say, therefore, that the 
Wiener process has independent increments. 


The integration of Eq. 8.3 over a small, but finite, time interval At 
provides a practical numerical simulation algorithm, 


y(t + At) = y(t) + Aly, At + n- cy, t)WAt, (8.4) 


(see Figure 8.1). The deviation of the numerical scheme from the Chapman- 
Kolmogorov equation quantifies the error involved; to minimize this type 
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of consistency error (i.e. to keep the error higher order in At), the time 
step At must satisfy the accuracy condition (€, « 1), 


El El 7 
aaa’ (saan) | : ve 


Furthermore, to generate a representative plot of the simulation results, 
the data must be stored with a resolution such that the fluctuations are 
captured on a time-scale much shorter than the evolution due to the 
drift, resulting in an additional plotting condition on the storage time 
step (€2 < 1), 


At < min 


2342 
At < 20) (8.6) 


~ AP(y,t) 


Higher-order schemes 


e S. Asmussen and P. W. Glynn, Stochastic simulation: Algorithms and analysis, 
(Springer, 2007). 


The simulation algorithm, Eq. 8.4, in the absence of diffusion (c(y, t) = 
0), reduces to the simple forward Euler method for the numerical solu- 
tion of ordinary differential equations. One can show that the scheme 
described above exhibits $-order strong convergence and 1%'-order weak 
convergence. The question that immediately presents itself is whether 
higher-order schemes, such as Runge-Kutta or Adams-Bashford methods, 
which have proved so useful in numerical solution of ordinary differen- 
tial equations, can be extended to stochastic systems. For the Langevin 
equation, it would seem they cannot without considerable complication of 
the stepping algorithm. The major difficulty is not so much the stochas- 
ticity, but rather that the trajectory governed by the Langevin equation 
has no well-defined derivative, frustrating the derivation of higher-order 
methods. We will briefly outline how higher-order explicit schemes can be 
derived — although we shall see that very little is gained from the added 
complexity of the algorithm. 

The main error associated with the Euler scheme comes from the ap- 
proximation, 
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To improve upon the algorithm, higher-order terms are included in the 
estimate. Re-writing 


At 
[ue naw ee) - o(u(0), 17 (As) = 
0 
At 

: {e(y(t), t) — e(y(0), 0); dW (t), (8.7) 
we expand c(y(t),t) using Itd’s formula (see Appendix C, p. 302). The 
result is Milstein’s scheme, (where again n; are independent samples of a 
unit Normal distribution N(0,1)), 


y(t + At) = 
y(t) + A(y, tHAt+ ni - ely, t)/ At - seu t) -cy(y,t)-(1—- n5), (8.8) 


which converges strongly with order 1. Higher-order schemes can be gen- 
erated, but they become increasingly unwieldy with only moderate im- 
provement of convergence. A pragmatic approach is to use coloured noise 
and higher-order stepping schemes, although convergence order can rarely 
be computed explicitly for these schemes. 


Coloured noise 


Coloured noise is often simulated by augmenting the model equations to 
include a Langevin equation governing an Ornstein-Uhlenbeck process. 
For example, the equation 


dy 

dt 
where F(t) is an Ornstein-Uhlenbeck process, is simulated by the aug- 
mented system, 


A(y,t) + c(y,t)- F(t), (8.9) 


d 

ap = Aly.t) + yt) FO, 
dF 1 1 
dt = To F(t) + Te n(t), 


where 77(t) is zero-mean, unit-variance Gaussian white noise, and 7; is the 
correlation time of the coloured noise process. Written in this way, the 
steady-state autocorrelation function of F(t) is 


eT ltI/te Te, 57), 


(FQ)F(t-7)) = 


Te 
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An improved approach comes from noting that the Ornstein-Uhlenbeck 
process is Gaussian. We can therefore write the exact stepping formula 
for F(t) as (Exercise 2b), 


1 
DF. 


F(t+ At) = F(t)-e"4/ +n- ( Ja gts) = (8.10) 


where, again, n is a unit normal random variable. The process governed 
by Eq. 8.9 is differentiable (see Eq. 7.13 on page 157), so we are free to 
use any stepping scheme we wish to advance y. That is, 


y(t + At) = h[y(t),..., y(t+ At), F(t)], 
1/2 
1 ) @ = a) : 


2T¢ 


F(t+ At) = F(t)-e74!/ $n. ( 


where h|-] is any implicit or explicit stepping scheme used to integrate 
Eq. 8.9. Echoing the analysis of Chapter 7, we again find that the 
Langevin equation is difficult to deal with because it is a differential 
equation characterizing a non-differentiable process. This paradox dis- 
appears when we replace the cannonical white noise forcing by a more 
physically relevant coloured noise. We are then able to use all the meth- 
ods developed in ordinary calculus, including the numerical integration 
of differential equations. Although it should not need saying, notice that 
in the limit 7. — 0, Eq. 8.9 reduces to the Langevin equation interpreted 
in the Stratonovich sense! In the next section, we shall discuss various 
approximation schemes for treating equations such as Eq. 8.9 under var- 
ious assumptions about the correlation time 7, and the magnitude of the 
noise (focusing exclusively on models with A and c linear in y). 


8.2 Approximation Methods 


There are several approximation methods that have been developed to 
derive evolution equations for the moments of a process governed by a 
linear stochastic differential equation. We shall consider approximation 
methods that assume either very long or very short correlation time for 
the fluctuations, and we shall limit the analysis to the derivation of leading 
order terms. 
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8.2.1 Static Averaging (Long correlation time) 


Consider the stochastic differential equation, 


where € is a stochastic process. It can happen that the fluctuations have 
very long correlation time compared with the rapid deterministic relax- 
ation of the system. In that case, we may treat the fluctuating coefficients 
as a time-independent random variable. The differential equation is solved 
in the ordinary fashion to obtain a solution that depends parametrically 
upon the fluctuating term u(t; &), then the solution is averaged over the 
probability distribution of the fluctuating coefficient f(€), 


(u(t)) = / P(eu(ts€)aé, 


We shall call this the static averaging approximation. An example will 
make the procedure clear. 
Example: The static averaging approximation is well-illustrated by the 
complex harmonic oscillator with random frequency, 

d 
a =-itu; u(0) = uo, (8.11) 
where € is a random variable. Because € does not depend on time, we can 
solve the random differential equation explicitly — 7.e. solve for arbitrary 
€, and average over the ensemble. Remember, we assume from the outset 
that the entire probability distribution for € is known. The solution is, 


It is instructive to look at a few special cases for the probability distribu- 
tion of f(€); in the following list &) and + are fixed parameters. 


1. Cauchy /Lorentz: 


= ee a ae —Eot—yt 
£) 72 a (é = fa): ) (u) Uoge 


2. Gaussian: 
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3. Laplace: 


f(= tee altel, (u) = ue 80 7 . 
9 ’ 42 +t? 


In each case, the averaged solution tends to zero as t + oo. This 
damping is due to the fact that the harmonic oscillators of the ensemble 
gradually lose the phase coherence they had at t = 0. In plasma physics 
this is called “phase mixing”, in mathematics the “Riemann-Lebesque 
theorem”. The modulus of u is not subject to phase mixing and does not 
tend to zero; in fact, 


|u (t)| = uo, hence (\u()/?) = us. 


The form of the damping factor is determined by f(€). Only for one 
particular f(€) does it have the form that corresponds to a complex fre- 
quency. Note that (u (t)) is identical with the characteristic function x (t) 
of the distribution f (€). The fact that (u (t)) = x (t) suggests the use of 
the cumulant expansion 


(u(t) = (et) = exp bs us on | 


m=1 


where kK, stands for the m‘”-cumulant, and indeed it is this idea that 


underlies the approximation discussed in Section 8.2.3. 


8.2.2 Bourret’s Approximation (Short correlation time) 


e R.C. Bourret (1965) “Ficton theory of dynamical systems with noisy parameters 
»” Canadian Journal of Physics 43: 619-639. 


e A. Brissaud and U. Frisch (1974) “Solving linear stochastic differential equa- 
tions,” Journal of Mathematical Physics 15: 524-534. 


e N. G. van Kampen (1976) “Stochastic differential equations,” Physics Reports 
24: 171-228. 


Consider the linear random differential equation, 


t= [Ao taAi(t)}us (0) =u, (8.12) 
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where u is a vector, Ao is a constant matrix, Ai(¢) is a random matrix, 
and a@ is a parameter measuring the magnitude of the fluctuations in 
the coefficients. If the mathematical model is chosen properly, then a is 
small. We would like to find deterministic equations for the moments of 
u. Notice if we simply average the stochastic differential equation, 


But we have no obvious method to evaluate the cross-correlation (A1(t)u). 
The methods derived in this section are approximations of that cross- 
correlation. Concerning the random matrix A;(t), we make the following 
assumptions: 


1. (Ai(t)) =0: This is not strictly necessary, but it sim- 
plifies the discussion. It is, for example, true if A,(t) is 
stationary, for then (Aj(t)) is independent of ¢ and can 
be absorbed into Ao. 


2. Aj(t) has a finite (non-zero) correlation time 7,: This means 
that for any two times t1, t2, such that |t; — ta] >> 7, one 
may treat all matrix elements of Aj (t1) and of Ay(t2) as 
statistically independent. 


First, we eliminate Ao from the random differential equation by setting, 
u(t) = e’v(t). 
Substitution into Eq. 8.12 yields the new equation for v(t), 


dv 


a = OV (Hut), (8.13) 


where V(t) = e~'4° Ay (t)e4°, and v(0) = u(0) = uo. Since a is small, 
the obvious method for solving this problem seems to be a perturbation 
series in a, i.e. assume a solution of the form, 


v(t) = vo(t) + avi (t) + a®ve(t) +--+. 
Substituting into Eq. 8.13, 


v(t) =u ta ([ viesyats) ug + 02 eae V(ti)V (la)atadt ) Up ees 
(8.14) 
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Upon taking the average, with fixed uo, 


(v(t)) = up +? (f fw V(t2)}dtadt ee (8.15) 


Dropping all higher-order terms, we have Born’s iterative approximation. 
One could claim to have an approximation for (v) to 2”¢-order in a; 
however, this is not a suitable perturbation scheme because the sucessive 
terms are not merely of increasing order in a, but also in t. That is, the 
expansion is actually in powers of (at) and is therefore valid only for a 
limited time, 7.e. at < 1. 

A different approach was taken by Bourret. Starting from Eq. 8.13, 
the equation can be re-written in an equivalent integrated form, 


=uo + a fv Vit )u(t’) dt’. (8.16) 


By iteration, this equation can be re-written as, 


v(t) = uo + of V(t’) { + of verve} dt! 


t t pt! 
= uo + woe | V(t’ )dt! + | | VieyVve u(t" dt" dt’, 
0 0 JO 


and on taking the average, 


(u(t yawree f fv UV (t" )o(t”)) dt" dt’. (8.17) 


This equation is exact, but of no help in finding (u(t)) because it contains 
the higher-order correlation (V(t’)V(t”)u(t”)).. Suppose, however, that 
one could write, 

(V(E)V (E(t) & (VE) V(t") (ot). (8.18) 


Then Eq. 8.17 becomes an integral equation for (v) alone, 


(u(t)) =uo +a aa fw (u(t) dt" dt’, (8.19) 


or, in terms of the original variables, and after differentiation, we are left 
with the convolution equation for (u(t)) 


Glult)) = Aolult)) +02 f (Are ALE) ulead. (8.20) 


0 
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This is called Bourret’s convolution equation, or sometimes in the physics 
literature, the mode-coupling approximation. It is a striking result because 
it asserts that the average of u(t) obeys a closed equation; thus one can 
find (u(t)) without knowledge of the higher moments of u(t). That is, 
without going through the procedure of solving the random differential 
equation du/dt = A(t;w)u for each individual w and averaging afterwards. 

Bourret’s convolution equation is formally solved using the Laplace 
transform, but the structure of the equation is sometimes not convenient. 
A convolution equation extends over the past values of (u(t)), and as such 
does not describe a Markov process. Nevertheless, with further manip- 
ulation, we are able to turn the convolution equation into an ordinary 
differential equation (albeit with modified coefficients), as was done by 
Kubo using a different argument. We use the noiseless dynamics (a = 0) 
to re-write (u(t’)), 


(u(t’)) e400) (u(t), (8.21) 


allowing (u(t)) to pass through the integral in Eq. 8.20, 
si - 
i, (Ay (theAe%— Ait’) (u(t) dt! & 
0 


( i (Aa(BeAOAg (eer Aol ae (u(t)). (8.22) 
0) 


We have assumed that A,(t) is a stationary random process, so the ar- 
gument of the integral is a function of time-difference only, 


(Ay (t)eAo@-#) As (t!)\)e7 A0t-*) = K'(t — #’). 


Making the change of variable 7 = t — 1’, 


t 


ae = [PK @ar. 


0 


The fluctuations are correlated only for small time separations (tT < 7), 
which means the upper limit of integration is not important and can be 
extended ¢ + oo incurring only exponentially small error, 


foimare freon 
0 0 
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This last term is a constant (i.e., independent of time), and so we arrive at 
Co 

an ordinary differential equation for (u(t)) where [ K’(7)dr renormalizes 
0 


the constant coefficient matrix Ao, 


< (u(t)) = | Ao +0? [ R'(o)dr - (u(t)) (8.23) 
0 
= {| Ag+ a® f (Au(r)e4*" A1(0)) e Act dr | - (u(t)). (8.24) 
0 


This is called Kubo’s renormalization equation. In contrast with Bourret’s 
convolution equation, Kubo’s approximation results in an ordinary differ- 
ential equation, and so (u(t)) characterized by this equation is a Markov 
process. Although the derivation may seem ad hoc, we can use cumulants 
to compute a very tight error estimate of the approximation, and in fact 
show that the error incurred in deriving Kubo’s renormalization is of the 
same order as the error incurred in Bourret’s approximation — so that 
Eq. 8.24 is as accurate as Eq. 8.20! The cumulant expansion method also 
points the way to calculation of higher-order terms (although the algebra 
becomes formidable), and will be the focus of the next two sections. 


8.2.3. Cumulant Expansion (Short correlation time) 


e N. G. van Kampen (1974) “A cumulant expansion for stochastic linear differ- 
ential equations,” Physica 74: 215-238. 


e R. H. Terwiel (1974) “Projection operator method applied to stochastic linear 
differential equations,” Physica 74: 248-265. 


The hypothesis of Bourret, Eq. 8.18, is unsatisfactory because it im- 
plies an uncontrolled neglect of certain fluctuations without any a priori 
justification. Nonetheless, it is important to note that replacing an av- 
erage of a product with a product of averages underlies the microscopic 
theory of transport phenomena, and, in a sense, it is what is meant by the 
Stosszahlansatz discussed on page 59, as we show below. In this section, 
we shall develop Bourret’s approximation (and higher-order terms) with 
a careful estimate of the error incurred at each step. 
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Cumulants of a Stochastic Process 


e R. Kubo (1962) “Generalized cumulant expansion method,” Journal of the 
Physical Society of Japan 17: 1100-1120. 


Let € be a scalar random variable. The cumulants ky, of € are defined by 
means of the generating function, 


(et) — exp bs mn om , 


(cf. Eq. A.9 on page 263). When discussing the cumulants of a stochastic 
function &(t), it is convenient to adopt the notation Km = ((€™)). 
Let €(¢) be a scalar random function, then, 


(e~*Jo ab Jae = 


bp bs = : ff (Ce) ltd (8.25) 


m=1 


The general rule for relating cumulants to the moments of €(t) is to 
partition the digits of the moments into all possible subsets; for each par- 
tition, one writes the product of cumulants, then adds all such products. 
The first few relations will make the prescription clear — Writing 1,2,... 


for &(t1), E(t2), Aer) 


1.3)) + ((3))((1 2)) + ((1 2 3)),.-. 


An expansion in cumulants is preferable over an expansion in moments 
for two main reasons — first, the expansion occurs in the exponential, 
greatly decreasing the risk of secular terms!. The second advantage of 
cumulants is that higher-order cumulants vanish if any of the arguments 
are uncorrelated with any of the others. For example, suppose &(t) has a 
short correlation time 7, — specifically, that for |f; — te| >> T., €(t1) and 
&(tz) are uncorrelated, then; 


(1 2) = (1)(2), 


1 Secular terms are terms arising in a perturbation approximation that diverge in 
time although the exact solution remains well-behaved. For example, after a short 
time 1 — t + t?/2! is a terrible approximation of e~*. 
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so we have, 


(1 2) = 1427 = (CH {2)) + ((1 2)), 


which implies ((1 2)) = 0. Similarly, one can show that having €(t1) 
and &(t2) uncorrelated is sufficient to ensure that all higher-order cumu- 
lants vanish, irrespective of their correlations with one another. That 
is to say, the m*’ cumulant vanishes as soon as the sequence of times 
{t1, t2,...,tm} contains a gap large compared to T. between any two ad- 
jacent times t; and ti+1. 

Since the cumulants are approximately zero once the gap among any 
of the time points exceeds 7, each integral in Eq. 8.25 is an (m — 1)- 
dimensional sphere of radius ~ 7, moving along the time axis. Said an- 
other way, each integral in Eq. 8.25 grows linearily in time (once t > T,). 


Cumulant Expansion 


Returning again to our original linear stochastic differential equation, 
— =aVit)v(t), (8.26) 


(with V(t) = e~'4° Aj(t)e’4°), we see that equation contains three time 
scales: 


1. The time 7y over which V(t) varies, which is not relevant 
in what follows. 


2. The time 1/a over which v(t) varies. 


3. The correlation time 7. over which V(t) is correlated. 
That is, (V(t) V(t’)) = 0 for |t — t’| >> Te. 


While the naive perturbation solution initially proposed failed because we 
expanded in powers of (at), an improved perturbation scheme should be 
possible if we expand in powers of (a7,).? If at. < 1, then it is possible 
to subdivide the interval [0,t] so that t >> 7, and yet at < 1. This 
is precisely the Stosszahlansatz underlying the derivation of the master 
equation, and precisely the assumption Einstein made when constructing 
his model of Brownian motion — that the time axis can be subdivided 
into steps At which are long enough that the microscopic variables relax 
to their equilibrium state (At > 7,), though not long enough that the 
observable state u(t) is appreciably changed (1/a >> At). The condition 


2The dimensionless parameter aT, is called the Kubo number. 
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that at. < 1 underlies the Bourret and Kubo approximations. As we now 
show, it is possible to justify these approximations and obtain higher-order 
terms as a systematic expansion in powers of (a7,) using the cumulants 
of a stochastic process. 

We can write the formal solution of Eq. 8.26 as a time-ordered expo- 
nential (see Section B.1), 


t 


v(t) = |exp ov (t’) dt’ > | (0), (8.27) 
0 


where the time-ordering operator [ | is short-hand for the iterated se- 
ries, Eq. 8.14. For the average process, (u(t)), we have the suggestive 
expression, 


(v (t)) = (o» fev (t’) w) v (0), (8.28) 


0 


since the average and the time-ordering commute. Eq. 8.25 shows how 
the cumulants of a scalar stochastic process can be used to express the 
averaged exponential. For the matrix V(t), it would seem at first sight 
that Eq. 8.25 is of little use, but we are saved by the time-ordering. Inside 
the time-ordering operator, we can freely commute the matrix operator 
V(t) since everything must be put in chronological order once the time- 
ordering operator is removed. We are therefore justified in writing, 


(v()) = 
exp a UV (ty) at +S | | ((V (t1) V (t2))) dtgdt; +... $] 0(0). 


(8.29) 


This equation is exact, no approximations have been made up to this 
point. There are several important characteristics of Eq. 8.29, however, 
that make it well-suited as a starting point for approximations of an evo- 
lution equation for (v(t)). First, the cumulant expansion appears in the 
exponent, greatly reducing the risk of secular terms. Second, successive 
terms in the exponent are of order a, a?,..., and grow linearily in time. 
For the matrix V(t), we assume the fluctuations have finite correlation 
time 7, in the sense that the cumulants of the matrix elements 


((Vig (t1) View (t2) >> Vor (tm))) = 0 
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vanish as soon as a gap of order 7, appears among any of the time-points. 
We then have a good estimate of the size of each term in the expansion — 
each term is of order a™ ¢ 72”—1. 

To obtain from the exact Eq. 8.29 a practical approximation of the 
process (u(t)), we begin eliminating terms in the expansion. For example, 
if we eliminate all terms a? and higher, we have, 


t 


(w(t)) » | exp / (CV (ty))) dts $] 00), 


0 
which solves the differential equation, 


d 
He) = VO)e O); 


the simplest possible approximation where the fluctuations are replaced 
by their average. Suppose that (V) = 0 (see page 176), then truncating 
Eq. 8.29 after two terms, 


wins fey 5 ff (Cv Gi) V ta)))atadts } | (0), 
0 0 


or, equivalently, 


(v(t)) = Jexp a® ff iv (ts) V (ta))) dtadty v(0). (8.30) 
0 0 


We introduce the operator, 


K (1) = / ((V (t1) V (t2))) do, 


and consider the differential equation 
d 
dt 


The solution is given by, 


(v (t)) = °K (t) (v(t)). (8.31) 


(uv (t)) = Jexp ¢ a? / K (1) dt; >| v (0). (8.32) 
0 
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This solution is not quite the same as Eq. 8.30, although the difference is of 
higher-order in at,. We may therefore use Eq. 8.31 as an approximation of 
the evolution equation for (v(t)). In the original representation, Eq. 8.31 
is identical to Eq. 8.22, 

t 


of acGsy Ao ta? f (Ay (t) e407 A, (t—7))) e407 § (u(t), 


dt 
0 
(8.33) 


which reduces to Bourret’s approximation once (u(t)) is brought back 
inside the integral, or Kubo’s approximation once the upper limit of in- 
tegration is extended t — oo. No ad hoc assumptions regarding the 
factoring of correlations have been made. Nor have any secular terms 
appeared in the derivation. Furthermore, we have a good estimate of the 
error incurred by the approximation — the error is of order (a377). 


Summary of approximation methods 


Paneer Condition on re 
Approximation Condition on t 


Born Iteration ot t<<1 


at<<1 


Bourret Convolution 


Kubo Renormalization t>>T, 


Static Averaging Ot,>>1 


Figure 8.2: User’s guide for various approximations of the evolu- 
tion equation for (y(t)) derived from a linear random differential 
equation. Redrawn after Brissaud and Frisch (1974). 


With the many approximation methods discussed above, it is help- 
ful to have a “user’s guide” outlining the range of applicability of each 
approximation scheme (Figure 8.2). Loosely speaking, if the correlation 
time of the fluctuations is short, then the Born approximation is only good 
over short times, the Kubo renormalization is good for times longer than 
the correlation time, and the Bourret approximation is good over short 
and long times, although it is a convolution equation, so it is slightly 
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more cumbersome to deal with than the Kubo renormalization. If the 
correlation time of the noise is long, then the static averaging approxima- 
tion is useful — if the Kubo number (az7;) is very large (a7, >> 1), then 
the approximation is good for all time, while if the Kubo number is not 
large, then the approximation is good for times shorter than the noise 
correlation time. 


8.2.4 Model coefficients (Arbitrary correlation time) 


e A. Brissaud and U. Frisch (1974) “Solving linear stochastic differential equa- 
tions,” Journal of Mathematical Physics 15: 524-534. 


Empirically, the statistics of a stationary random function are often 
limited to high-confidence estimates of the single-point-probability and 
the time-covariance function. As outlined in Sections 8.2.1 and 8.2.2, for 
a stochastic differential equation with multiplicative noise, 


dx 
a7 &(t)a, (8.34) 


if the correlation time is long, then the single-point probability distribu- 
tion dominates the dynamics of the average (a(t)); if the correlation time 
is short, then the time-covariance dominates. Brissaud and Frisch pro- 
posed a very clever approach to estimate the moments of x(t): given a 
single-point probability distribution p(€) and a time-covariance function 
((€(t)&(t — T))), approximate the function €(t) with a process ¢(t) that 
shares these features, but for which the model is exactly solvable. 


8.2.5 Poisson step process 


The essential idea is to combine elements of the static and Bourret approx- 
imations to develop an exact solution for (a(t)). To that end, the random 
coefficient €(t) is replaced by what is called a Poisson step process C(t), 
with identical single-point probability p(€) = p(¢) and time-covariance 
((E(E)E(t — T))) = ((C(t)¢(t — 7))). Specifically, a Poisson step process is 
defined in the following way: 


Definition: The step-wise constant function ¢(t) is called a Poisson step 
process (or Kubo-Anderson process) if the jump times t; are indepen- 
dently distributed in (—oo, co) with density v (Poisson distributed), and 
C(t) is constant ¢(t) = ¢; for t; < t < ti41. The ¢; are independent ran- 
dom variables with probability density p(¢). 
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The Poisson step process is a stationary process with single-point proba- 
bility density p(¢). Re-scaling the deterministic dynamics (or taking the 
interaction picture as in Bourret’s approximation), the average (¢) = 0 
without loss of generality. The time-covariance function is, 


((G(E)C(t — 7))) = (C7) eM 


The convenient feature of the Poisson step process is that the average 
(x(t)) characterized by the multiplicative stochastic differential equation 
can be computed exactly (at least formally in terms of the Laplace trans- 
form). 

It is convenient to solve for the Green’s function of Eq. 8.34, 


dG(t, 0 

EO _ cWHaUt,0); (0,0) =1, (8.35) 

rather than for the solution x(¢) directly. That allows for non-homogeneous 
or additive noise to be treated within the same framework, and simplifies 
the derivation by appealing to the semi-group property of the Green’s 
function, 


G(t,0) = G(t, t/)G(t’, 0). 


The derivation proceeds as follows: a ‘master equation’ for the average 
of the full Green’s function (G(t,0)) psp (where PsP denotes the Poisson 
step process) is written as a sum of two components: one components 
coming from no jump between (0,¢) and a second component that inte- 
grates over past jumps that have occurred between (0, t). 

If there is no jump between (0,t), then ¢ is constant, and the solu- 
tion of Eq. 8.35 is obtained as in the static approximation (eS). The 
jumps are Poisson distributed, so the probability of no jump is e~”*. The 
contribution to (G(t,0)) psp is, 


(G(t, 9))No jump — ee) =e" / eS 'p(¢')d¢" = eGo (0), (8.36) 


where Gg(t) is the static contribution. 

For a single realization of the process ¢(t), let the jumping times be- 
tween (0,t) occur at the points ty < tg <... < ty. By the semi-group 
property of the Green’s function, 


G(t, 0) = G(t, tn) G(tnstn—1) -*« G(te, tr) G(tr, 0). (8.37) 
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During each sub-interval (t:41, ti), ¢(¢) = G is constant, so we can solve 
Eq. 8.35 explicitly as above, 


G(tiz1, ts) = exp [(tita — 4) Gi]. 


For a Poisson distributed variable, the probability that the length of an 
interval between two successive jumping-times lies between t and t+ dt is 
ve—’'dt, and the probability to obtain a configuration of n jumping-times 
located at (t1,t2,...,tn) within dt,dt2---dty, is, 


vty 


ve ve V(t2—t1) . ye ¥(tn—tn—1) , eV (ttn) dts dita +++ dtm 


Using Eq. 8.37, summing over n = 1, 2,...,00, and using (G(t,0))No jump 


for the n = 0 case, the average of the full (G(t,0)) psp is given by an in- 
finite series of integrals, 


t 
(G(t,0)) psp =e "Galt +f e"t-t Go (t — t1) ve" Gg(ti)dti +... 


+f Ve ae Ute) alte we ee PU Gs (ty = ty 1X 


e ”"Gg(ti)dty---dtn +... (8.38) 


This infinite series is equivalent to the convolution equation, 


t 
(G(t,0)) psp = eG (t) + | e “-Ga(t — t1)(G(t1, 0)) psp dtr. 
0 

(8.39) 
To assure yourself that this is so, ‘solve’ Eq. 8.39 by successive iteration 
(as in the equations leading up to Eq. 8.17). Coinvolution equations are 
simple to solve using the Laplace transform; in particular, the Laplace 
transform (G(s))psp is given by an algebraic function of the Laplace 
transform of the static Green’s function G's(s), 
F Gg(s+v 
(G(s)) pep = “SE TY) 
1—vGg(s+v) 


where the shift in the Laplace variable comes from the exponential pre- 
factor: Lle~” f(t)] = f(s tv). 


8.3. Example — Kubo Oscillator 


e N. G. van Kampen (1976) “Stochastic differential equations,” Physics Reports 
24: 171. 
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e M. Gitterman, The noisy oscillator: The first hundred years, from Einstein 
until now, (World Scientific Publishing Company, 2005). 


The complex harmonic oscillator with random frequency used to illus- 
trate the static approximation in Section 8.2.1, 


& = -i€(t)u; u(0) = wo, 


is a useful example to illustrate the other approximation methods derived 
in this chapter. Here, we will focus on a stationary perturbation and 
explicitly write the average of the process as &o, 


Oe = ~i(G + 0€1(¢))us (0) =u, 


where (&1(t)&1(t — T)) is characterized by a finite correlation time Te. 


Kubo approximation, at, < 1 


Using Kubo’s renormalization approximation, Eq. 8.24, we have, 
A |-iga — 08 [(eiete—r))dr] (w). (840) 
0 


For a random perturbation with integral ie (E1(t)E1(t—7))dr = 7, (e.9., 
(€(t)€1(t — 7)) = e7!7!/7e), the equation for the average is, 


au 


an [—i£0 _ ar rs| (u). 


irrespective of the details of the correlation function. The solution is then, 


(u(t)) = uo exp [—é€ot — a? Tet] . 


8.4 Example — Parametric Resonance 


In contrast to the example above, we now consider a linear differential 
equation with time-varying coefficients — the Mathieu equation, 


ti + [a + 2q¢cos 2t]u = 0. (8.41) 


Over a range of a and q, the periodic time-dependence in the frequency 
leads to instability via a mechanism called parametric resonance. This 
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is the same effect that is exploited by children “pumping” a swing. To 
transform the Mathieu equation to a stochastic differential equation, we 
allow the parametric forcing to have a random amplitude g++ q(1+a&(t)). 
In vector-matrix notation, 

af uy 

dt ( uu ) > 


( seis hail ' ) ( i ) sae ( Eee i ) ( i ): 
(8.42) 


Because the Mathieu equation has time-dependent coefficients, it is no 
longer possible to find a closed form for the solution, even in the absence 
of fluctuations. Nevertheless, the formal solution can be written as a 
time-ordered exponential, 


( ) ~ lex { f Acttnats} ( a ) w= 0. 4648) 


If we adopt the Kubo renormalization approximation, the correction to 
the deterministic dynamics is the integral, 


[ (aver- oro [” aatiytn] x are) 
« fox {— [” avceyae bar (sa 


Here, [exp {— an Ao(ty)dt; }| is the matrix inverse of the deterministic 
propogator, Eq. 8.43, and 


Ai (t) = a€ (t) ( eg ; ). (8.45) 


Let €(t) be a colored noise process, with an exponential correlation func- 
tion, 


a 
(e(ett-7) =e |-). 

In that case, we can invoke Watson’s lemma (Section B.6.2) to approxi- 

mate the integral (8.44) as a series in powers of 7, (Excercise 9). 
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8.5 Optional — Stochastic Delay Differential 
Equations 


In many models of physical systems, several degrees of freedom are elim- 
inated from the governing equations by introducing a time delay into the 
dynamics. The result is a system of delay differential equations. For linear 
delay differential equations with stochastic coefficients, it is straightfor- 
ward to derive an approximation for the averaged process as we have 
done in the preceding sections. In particular, we shall derive Bourret’s 
convolution approximation for a linear stochastic delay equation in the 
limit that the time-delay is large compared to the correlation time of the 
fluctuating coefficients. To that end, consider the linear delay differential 
equation, 


d 

Be = +bx", x(t)=29 fort <0, 

where 7q is the delay time and x7 = a(t — Ta). The one-sided Green’s 
function for this system g(t) is calculated using the Laplace transform. 
The auxiliary equation characterizing g(t) is 


Taking the Laplace transform L {-}, with g(s) = £{g (t)}, 


sg (s) — ag (s) — be *™g(s) = 1, 
since £ {g(t — Ta)} =e *L {g (t)}. Explicitly, we write, 
24 A _ 1 
1) =£" Gy} = {— eh. 
s—a-—be 
Consider the original delay equation, but now with zero-mean multiplica- 
tive noise 7(t), 


oa =ax+ bx’ +n(t)e. (8.46) 


Using the one-sided Green’s function g(t), we write the formal solution of 
(8.46) as a convolution, 


t 


a (1) = tog (t) + / g(t—t) nt) a(t) ae’ 


0 
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With substitution into the right-hand side of (8.46), 


t 


oa = ax + baT + / g(t—t') n(t)n(t') x(t’) dt’. 
0 


Taking the ensemble average, we are left with the evolution equation for 
the first-moment, 


5 (o) = aa) ba") + f g(t) (n(n) ot) ae 


0 


Invoking Bourret’s approximation to factor the cross-correlation, 


we have 
5 (0) = ala) +(a") + f g(t =e) in lt) nie) (e @)) ae 
0 


We further assume that (t) isa Gaussian distributed, stationary, Markov 
process so that, 


(n(n) = K t=t) = oP exp [FA], 


and 7, is the correlation time of the noise. The approximate evolution 
equation then simplifies to, 


Writing the convolution explicitly, 


“ (x (t)) = a (a (t)) + b (a (t— 7a)) + [g (4) K (4)] * (2 (4). 


The Laplace transform (#(s)) is readily obtained, 


(@(8)) = 


Lo 
s —a— besa — o2£ {9 (t) em} 
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Explicit inversion of the Laplace transform g(s) is difficult, but since 


(@(s)) = “ = (8.47) 


(s — a — be~874) — o? [8 — a — be~474 


where $=s+ oe 

In the absence of noise, the asymptotic stability of x(t) is determined 
by the real part of the pole of the Laplace transform @(s), or equivalently 
by the real part of s*, where s* is the root of s* — a — be7*"™, 


* 
s* —a—be* ™%=—0. 


Similarly, the asymptotic stability of the first-moment (x (t)) is deter- 
mined by the real part of s* satisfying, 


-1 


* 1 Ls 
(s* —a— be* 74) — a? (+ +=) ~a— be (e" +76) r8 = 0. 


Te 


To examine the white-noise limit (in the Stratonovich sense), it is conve- 
nient to make the substitution ¢? = is so that, 


a ee eae 
lim o*e” 7 = lim 
Te—0 Te 0 2Te 


et =T5(t), 


€ 
with f 6(t)dt = 4, (the Stratonovich interpretation of the Dirac delta 
0 


function). We then write the resolvent equation as, 


Te (s" -—a- be-*" 40-7 (s" -a- nga) =0. (8.48) 


In this form, the roots of the resolvent equation can be developed as an 
asymptotic series in a straightforward manner (see Exercise 11). 


Suggested References 


Much of the early part of this chapter comes from, 
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A. B. 


1. x(r+At)=2(1)—x(1)-(¢) At 7 
n{ras)=p-n(eyevi=P ENO.) | (o(9)) 8 
6 
[oeol-F) 
2 
_ eater encore a 
2. x(t+ Ar) = x(1)—x(r)-VAr-N(0,1) ‘ 4 


time 


Figure 8.3: Jenny’s paradox - Exercise 1. A) Two different schemes 
to represent the differential equation & = —x x(t), where n(t) is Gaussian 
white noise. B) The averaged behaviour using each of the schemes shown 
in panel A. The average is taken over an ensemble of 1000. The step-size 
is At = 10? and the correlation time is t. = 10~'. N(ju,07) represents 
a random number drawn from a Normal distribution with mean p and 
variance 07. 


e A. Brissaud and U. Frisch (1974) “Solving linear stochastic differ- 
ential equations,” Journal of Mathematical Physics 15: 524-534. 


Two excellent sources are van Kampen’s review, 


e N.G. van Kampen (1976) “Stochastic differential equations,” Physics 
Reports 24: 171-228, 


and the review, translated from Russian, 


e V. I. Klyatskin and V. I. Tatarskii (1974) “Diffusive random pro- 
cess approximation in certain nonstationary statistical problems of 
physics,” Soviet Physics — Uspekhi 16: 494-511. 


This second article is notable in its use of the Novikov-Furutsu relation 
(see Section 11.5 on page 251). 
Exercises 


1. Jenny’s paradox: In the course of testing her code to numerically 
integrate stochastic differential equations, Jenny noticed that the 
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following differential equation, 


d 

ap ent) e(0) = 1 

(where 7(t) is Gaussian white noise) had different averaged be- 
haviour ((t)) depending upon how she coded the white noise (Fig- 
ure 8.3). 


(a) Explain what each routine shown in Figure 8.3A is intended 
to do. What is the fundamental difference between the two 
routines? 


(b) Why don’t both routines return the same result? What should 
Jenny expect the averages to look like? What will happen to 
the output of scheme 1 if the step-size At is reduced? What 
will happen to the output of scheme 1 if the correlation time 7, 
is reduced (with 0 < At < 7,)? Will the two schemes coincide 
in that limit? 


. Simulation of the Langevin equation (in the It6’s sense): 


(a) Show that the update scheme, Eq. 8.2, follows from the Langevin 
equation interpreted in the Ito sense. 


(b) Show that the update scheme, Eq. 8.10, provides the necessary 
correlation function for the Ornstein-Uhlenbeck process F(t). 


Gaussian distributed random number generator: In the sim- 
ulation of random differential equations, it is often necessary to gen- 
erate Gaussian-distributed random numbers. Since there are many 
tried and true routines for generating a unit uniform random num- 
ber, we shall consider how to transform a realization r € U(0,1) to 
a number n € N(0,1). Though most coding packages include sub- 
routines for just this purpose, explicit derivation of the algorithm is 
illuminating. 


(a) What makes it impossible to directly invert a unit uniform 
random variable into a normally distributed random number? 
Why is this not a problem for inversion to an exponential or 
Cauchy distribution? 


(b) Using two unit uniform realizations r; and rz, introduce two 
auxiliary random numbers s = a[2 In(1/r1)]*/? and @ = 27ro. 
Show that x; = w+ scos@ and rz = w+ ssiné@ are a pair 
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of statistically independent sample values of the Gaussian dis- 
tribution N(,07). Hint: Use the Joint Inversion Generating 
Method described on page 271, along with the subordinate 
density functions Qi(s) and Qs” (6|s). 


4. Cumulant expansion: In the derivation of the cumulant expan- 
sion, several details were left out. 


(a) What is the difference between Eq. 8.30 and Eq. 8.32? 
(b) From Eq. 8.33, derive the Bourret and Kubo approximations. 


5. Show that for a zero-mean Gaussian process F(t), 


(cxp|i [’ rye] ) = exp [0 fe nwa], 


where (F(t) F(t2)) = o7(|t1 = ta). 


6. Eigenvalues of the averaged random harmonic oscillator: 
From the equation governing the averaged response of the random 
harmonic oscillator, Eq. 8.40, derive the eigenvalues of the system 
for the following choices of noise autocorrelation function: 

(a) Exponential correlation: ((€(t)&(t — T))) = 0? exp |- 2 


o2 


(b) Uniform correlation: ((€ (t) € (t — r))) = { ff ae eens 


—Te STS Te 


(c) Triangle correlation: 


‘2 
ZF Flr] -1.< 7 <T. 


(EWEt-7))) = Te 72 


0 otherwise 


(d) Damped sinusoidal correlation: ((€(t)€(t — 7))) = e ag) 


I7| 


Do you notice any commonalities among the eigenvalues resulting 
from all of these various choices? Explain. 


7. Redfield equations: We can exploit the linearity of Eq. 8.12 to 
generate an equation for the higher moments. 


(a) Using tensor notation, each component of the N-dimensional 
differential equation, 
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N 
is written u; = >> Aj;u;. Use this formulation to derive the 
j=l 
differential equation for the products u;ug. 
Use either Bourret’s convolution equation or Kubo’s renormal- 
ization approximation to derive an equation for the second 
moments (u;ux) for the damped random harmonic oscillator 
discussed in Section 8.3. Verify the approximate equation for 
the second moments using an ensemble of simulation data. 


Determine the conditions for stability of the second moments 
(mean-squared stability) for the damped random harmonic os- 
cillator discussed in Section 8.3. It may be helpful to review 
stability conditions for ordinary differential equations, cf. Ap- 
pendix B.2.1. Verify the stability bounds using an ensemble of 
simulation data. 


Use Kubo’s renormalization approximation to derive an equa- 
tion for the second moments (u;u,) for the parametric oscil- 
lator discussed in Section 8.4. What happens to the corrected 
dynamics in the limit of white noise t. + 0? Verify the ap- 
proximate equation for the second moments using an ensemble 
of simulation data. 


8. Additive noise: Consider the inhomogeneous random differential 


(a) 


equation, 


a = [Ao +aAi(f)]-utf(), 


where Aj(t) and f(t) are correlated, and f(t) = fo + f\(t). We seek 
an approximate evolution equation for the mean (u). 


Notice that the inhomogeneous equation can be written as a 
homogeneous equation if the state space is expanded, 


af a \ of AE u 

d\ 1) \ 0 0 Lye 
allowing the methods developed in this section to be readily 
applied. To that end, show that, 


Ao fo 7 eTAo cae fo 
oE(# $))-( 98) 
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(b) Derive an equation for (u). 


9. Parametric resonance: It is challenging to compute the evolution 
equation for the average process (u) characterized by the stochas- 
tic Mathieu equation, Eq 8.42. The primary difficulty lies in the 
time dependence of the coefficients which results in the determinis- 
tic Mathieu equation not having a closed form-solution. 


(a) Write out the formal solution of the deterministic Mathieu 
equation (i.e., @ = 0 in Eq. 8.42) as a series of iterated integrals 
(the Born approximation). 


(b) For narrowly-peaked correlation function, we can invoke Wat- 
son’s lemma (Section B.6.2) to express integral (8.44) as a se- 
ries in the correlation time 7,. Write out the details of this 
approximation and thereby arrive at the renormalized equa- 
tion for (u). 

(c) The stability of the mean of an oscillatory process is not partic- 
ularly informative - much more important is the energy stabil- 
ity determined by the stability of (u2) + (u?). Use the Redfield 
equations (Excercise 7) to generate approximate equations gov- 
erning (u?), (u w) and (u?). Under what conditions are these 
asymptotically stable? Verify the approximation results using 
an ensemble of simulation data. 


10. Random differential equations with Markov noise: For the 
random differential equation du/dt = F(u,t;Y(t)), where u € RN 
and Y(t) is a Markov process having probability density II(y, t|yo, to) 
obeying the master equation dII/dt = WII, the joint probability 
density P(u, y, t|uo, yo, to) obeys its own master equation, 


aP ao 
a7 d aa F,(u,t;y)P + WP. (8.49) 


This equation is often not very useful since W can be quite large 
(even infinite). Nevertheless, Eq. 8.49 is exact and it makes no 
assumptions about the correlation time of the fluctuations. 


(a) Show that if the random differential equation is linear (with 
coefficients that do not depend upon time), 


du; a 
me Bis (YO) 5, 


a 
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then Eq. 8.49 reduces to a particularly simple form. 


Define the marginal averages m,, 


m(y,t) = jf Pluy.tau 


For the linear system introduced in (10a), find the evolution 
equation for m;. These equations must be solved with initial 
values m;(y,0) = ugilI(y,0). The average of the process of 
interest is given by the integral, 


(ui(t)) = i maly,t)dy. 


If Y(t) is a dichotomic Markov process (i.e. a process taking 
taking one of two values +1 with transition rate y; see Ex- 
cercise 4 on page 76), then the joint probability is simply the 
two-component vector P(u,+1,t) = [P;(u,t) P_(u,t)] with 
transition matrix, 


W= | cia 
Gera | 

Consider the random differential equation & = —iw(t)u where u 
is complex and w(t) is a random function of time. This example 
was introduced by Kubo to illustrate the effect of fluctuations 
on spectral line broadening. Let w(t) = wo + a€(t), where €(t) 
is a dichotomic Markov process. Show that Eq. 8.49 gives two 
coupled equations, 


OP. _O 
ae =i (wo a)uP, —yP,+P_, 
OP_ .O 
Be aa (wo — a)uP_ —7yP_ +P. 


Find the equations for the two marginal averages m+(t) using 
the initial condition m4(0) = ga. Find (u(t)) = m4(t) + 
m_(t). What can you say about the two limits 7 < a (slow 
fluctuations) and 7 > a (fast fluctuations)? 


11. Stochastic delay differential equations: Read Section 8.5. 


(a) 


To determine the stability of the stochastic delay differential 
equation is not easy. Re-write the resolvent equation by non- 
dimensionalizing all of the parameters appearing in Eq. 8.48 
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(b) 


with respect to the delay time 7g, and introduce the perturba- 
tion parameter ¢ = ze 


Assume the correlation time is short compared with the delay 
time (« = 7¢ < 1), and develop a perturbation series for s = 
so tes, +6759 +.... The zero-order term so is given in terms 
of Lambert functions. Find an expression for s1. 


Derive the analogue of Eq. 8.47 for a system of delay differential 
equations, 7.e. find an expression for (@(s)) where x(t) is an 
n-dimensional vector. 


CHAPTER 9 


ee 


MACROSCOPIC EFFECTS OF NOISE 


The great success of deterministic ordinary differential equations in math- 
ematical modeling tends to foster the intuition that including fluctuations 
will simply distort the deterministic signal somewhat, leading to a distri- 
bution of states that follows the macroscopic solution. In that sense, 
studying stochastic processes seems to be an interesting hobby, but not 
very practical. We have seen, however, that the statistics of the fluc- 
tuations can be used to measure parameters that are not accessible on 
an observable scale — for example, Avagadro’s number as determined by 
Perrin from Einstein’s work on Brownian motion (page 8), Johnson’s de- 
termination of Avagadro’s number from thermal noise in resistors using 
Nyquist’s analysis (page 46), and even the mutation rate in bacteria (see 
Excercise 5 on page 23). In this Chapter, we consider some examples 
where stochastic models exhibit behaviour that is different from their de- 
terministic counterparts!: Unstable equilibrium points at 0 become sta- 
ble (Keizer’s paradoz, Section 9.1), or stable systems that exhibit regular 
oscillations when fluctuations are included (Sections 9.2 and 9.3.1). In 
these cases, stochastic models provide insight into system behaviour that 
is simply not available from deterministic formulations. 


1 Deterministic counterpart means the deterministic ordinary differential equations 
obtained in the macroscopic limit of the master equation, 7.e., number of individuals 
and the volume go to infinity, while the density is held fixed. Alternatively, the zero’th 
order term in the linear noise approximation. 
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9.1 Keizer’s Paradox 


e J. Keizer Statistical thermodynamics of nonequilibrium processes (Springer- 
Verlag, 1987), p. 164-169. 


e M. Vellela and H. Qian (2007) “A quasistationary analysis of a stochastic chem- 
ical reaction: Keizers paradox,” Bulletin of Mathematical Biology 69: 1727. 


The fluctuations described by the master equation come about, in 
part, because the systems under study are compose of discrete particles. 
It should be no surprise then that as the number of particles decreases, 
the discrete nature of the constituents will become manifest. So, for 
example, a fixed point that is unstable with respect to infinitesimal per- 
turbations is stable if approached along a discrete lattice. This is Keizer’s 
paradox. We shall examine his argument in more detail below, but it is 
important to emphasize that Keizer used his example to illustrate how 
correct macroscopic behaviour is sometimes not exhibited by the master 
equation, suggesting caution in interpreting asymptotic behaviour of the 
master equation literally. Some authors, however, have taken variations of 
Keizer’s paradox to suggest that the master equation formalism is some- 
how fundamentally flawed, or that the macroscopic limit is problematic. 
Such claims should not be taken too seriously. 

As a concrete example, consider the following (nonlinear) autocat- 
alytic reaction scheme: 


k 
X 2x, X 2, 9, (9.1) 
as 


The deterministic equation governing the concentration of the species X 
is given by the differential equation, 


dX 
= (ky — ky) X — kX’. (9.2) 
dt 
The two equilibrium points for the model are, 
ky —k 
Reno and Ko) (9.3) 
ky 
It is straightforward to show that X** = 0 is unstable, while the equi- 
librium point X** = tey—ko) is stable. Re-cast as a master equation, the 


probability p(n, t) of the system having n molecules of X at time t obeys 
the (nonlinear) master equation, 


dp _ 


ae ky (E~1 — 1) np + ke (E- 1) np +k_1(E-1)n(n—-1)p, (9.4) 
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where we have absorbed the volume V into the definition of the rate 
constants: k, = k,/V, etc. Writing out each component of p(n) as a 
separate element in the vector p,, the master equation can be expressed 
in terms of the transition matrix W (Excercise 1b), 


—=W-p. (9.5) 


It is more convenient to separate each element in Eq. 9.5 as a coupled 
system of ordinary differential equations, 


as = kop, 
dt 
ah = (i, + i) P2 - G =F i) Pi, 
WP — hyp, —2 (1 + hr + hy) po +3 (2h-1 +h)... 


Quoting from Vellela and Qian, 


By induction, all probabilities p, for n > 0 are zero. Because 
the sum of all the probabilities must add to one, this forces 
po = 1. So we have, 


p*(0)=1 and p*(n)=0, n>0, (9.6) 


as the probability distribution for the unique steady state of 
the stochastic model. Note that the steady state of the deter- 
ministic model are fixed points, and the steady state of the 
stochastic model has a distribution. The stochastic model 
shows that eventually there will be no X left in the sys- 
tem. This is in striking contrast to the previous deterministic 
model, which predicts that the concentration of X will stabi- 
lize at the nonzero x*, while x* = 0 is unstable. This is the 
Keizers paradox. 


This ‘paradox’ is of the same form as Zermelo’s paradox discussed 
on page 67 and is resolved in precisely the same fashion. For systems 
with a reasonable number of molecules, say X(0) = 1000, the paradox 
disappears. Quoting from Keizer (p. 166), 


Since n§, is large, the probability of nx taking on other val- 
ues will grow, and in a time 7 = 1/k, it will become a sharp 
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Gaussian centered at n% = Vki/2k2 [the macroscopic equi- 
librium point]. On a much longer time scale, the order of 
T2 = T,eV*1/2k2, the systems of the ensemble in the low-nx 
tail of the Gaussian will fall into the absorbing state at nx = 0. 
This creates a new peak near nx = 0 and, if one waits for the 
period of time 72, almost all members of the ensemble will have 
no X molecules. The important point here is the difference in 
the two time scales 7; and 72. For a large system, V becomes 
infinite. This has no effect on ka, which is independent of 
the volume, whereas 72 depends strongly on the volume and 
becomes infinite like e”. Thus to witness the probability in 
Eq. 9.6 one would have to wait a time of the order of 101°” 
times longer than it takes to achieve the distribution centered 
at nS° = Vk, /2k. This is tantamount to the probability in 
Eq. 9.6 being unobservable. 


We will generally be interested in systems where we are able to ap- 
ply the linear noise approximation. Under the conditions of applicability 
of this approximation, the paradoxical distinction between the behaviour 
of the master equation and its deterministic counterpart disappears. As 
Keizer says, “if used uncritically, [the master equation] can lead to mean- 
ingless results.” There are situations where the behaviour of a stochastic 
system, even with a large number of molecules, is obviously in conflict 
with the predictions of the deterministic model, coming from mechanisms 
distinct from the separation of time scales underlying Keizer’s paradox. 
We shall examine two examples in the following sections. Other deviant 
effects are described in: 


e M.S. Samoilov and A. P. Arkin (2006) Deviant effects in molecular 
reaction pathways. Nature biotechnology 24: 1235-1240. 


9.2 Oscillations from Underdamped Dynam- 
ics 


Before we consider quantification of noise-induced oscillations, we will col- 
lect a review of relevant background scattered through previous chapters. 
Looking back to the multivariate linear noise approximation (Eq. 6.10 
on page 126), we found that to leading order in the system size 2, the 
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fluctuations @ obey the linear Fokker-Planck equation, 
-1 ol 1 
tJ a,j 
where 0; = ae and, 


D =S-diag[v]-S7, (9.8) 


where, again, f are the macroscopic reaction rates, S is the stoichiometry 
matrix and nu is the vector of reaction propensities. The matrices [ and 
D are independent of a, which appears only linearly in the drift term. As 
a consequence, the distribution II(@,t) will be Gaussian for all time. In 
particular, at equilibrium I, and D, will be constant and the fluctuations 
are distributed with density, 


1 
2 i 
II, (a) = [(2n)* det z.| : exp |-5a" -B. tal, 


and variance ©, = ((a-a7)) determined by, 
T,:-&,+2,-T.7+D, =0. (9.9) 


Finally, the steady-state autocorrelation is exponentially distributed (see 
Eq. 5.18 on page 5.18), 
B(t) = (a(t) a5 (0)) = exp [Pst] - 5.. (9.10) 


In Chapter 2, we introduced the fluctuation spectrum S(w) (see Sec- 
tion 2.3 on page 39), 


S(w) . ve eT B(r)dr => B(r)= Ve. eT S(w)dw. (9.11) 


= Qn —0oo —0oo 
which is the Fourier transform of the autocorrelation function, and pro- 
vides the distribution of the frequency content of the fluctuations. In 
multivariate systems, it is straightforward to show that the Fourier trans- 
form of Eq. 9.10 is (Excercise 2), 


1 _ = 
Sa(w) = 5 Ps + Tia] Dye (PE Tels (9.12) 
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Since Sa(w) contains the distribution of the frequency content of the 
fluctuations, narrow peaks in S,(w) indicate coherent noise-induced os- 
cillations, as we discuss below. 

Finally, in Chapter 6 (on page 6.5), it was shown that the linear 
Fokker-Planck equation is equivalent to the Langevin equation. For a 
multivariate system, the same result holds, and we find that Eq. 9.7 is 
equivalent to, 


— =T,-a+B-n, (9.13) 


where D, = BB? and 7 is a vector of uncorrelated unit-variance white 
noise with the same dimensions as the reaction propensity vector v. 

Physical insight into the origin of noise-induced oscillations comes from 
re-writing this equation more suggestively, 


 _T,-a=B-n, (9.14) 
where now the dynamics of @ about the deterministic steady-state are 
mapped to a harmonic oscillator with white-noise forcing. It should be 
clear that the frequency response of the system is given by the eigenvalues 
of the deterministic Jacobian T,, and that if any of the eigenvalues of I, 
have a non-zero imaginary part, then the system will selectively amplify 
the components of the white-noise forcing that lie near to the resonance 
frequency of the system. 

Said another way, looking at the spectrum at the level of species con- 
centration, 


Sx(w) = ~ [Pg-b Pw) 2 - (VP — liv] aT (9.15) 
if the eigenvalues of T., are complex (i.e., underdamped), then noise- 
induced oscillations are inevitable and these oscillations will have am- 
plitude proportional to Be In the deterministic limit (Q — oo; constant 
concentration), the amplitude of the oscillations will of course vanish, re- 
covering the deterministic stability implied by the negative real parts of 
the eigenvalues of [,. 


206 Applied stochastic processes 


: 4 

0 

1740 1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 
Year 


Figure 9.1: Oscillations in Predator-Prey Populations. Sample of 
lynx and hare populations taken from Pineda-Krch, Blok, Dieckmann 
and Doebeli (2007) Oikos 116: 53-64; Figure 5A. Data is from Elton and 
Nicholson (1942) J. Anim. Ecol. 11: 215-244. Notice the vertical axis is 
logarithmic. 


9.2.1 Example — Malthus Predator-Prey Dynamics 


McKane AJ, Newman TJ (2005) “Predator-prey cycles from resonant amplification of 
demographic stochasticity,” Physical Review Letters 94 (21): Art. No. 218102. 


Following McKane and Newman, we consider noise-induced oscilla- 
tions in predatory-prey models used in theoretical ecology. Some very 
brief historical context helps to illuminate the significance of McKane 
and Newman’s analysis. 

Predatory-prey populations often exhibit oscillatory dynamics (Fig- 
ure 9.1). One of the earliest models of predator-prey dynamics is the 
Lotka- Volterra model (1925-1926), governing the birth, death and preda- 
tion among a population of predators Z and their prey P, 


dP 

Be ape PS BZ +P 

eas B , 

dZ 

Bn 2: PZ. (9.16) 


Here: 


a: birth rate of prey 
8: rate of predation of P by Z 
y: efficiency of the predator to turn food into offspring 
6: death rate of predators. 
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Figure 9.2: Noise-induced oscillations in predator-prey models. 
A) Sample stochastic simulation of the modified Lotka-Volterra model, 
Eq. 9.17. The deterministic trajectory rapidly approaches equilibrium 
(dashed line). Noise-induced oscillations are evident in the simulation 
data. B) Fluctuation spectrum for the predator population extracted 
from an ensemble of 500 stochastic simulations, compared with the ana- 
lytic approximation, Eq. 9.15. The inset corresponds to the power spec- 
trum for the prey. Taken from McKane and Newman (2005). 


While these equations do indeed exhibit oscillatory dynamics, the eigen- 
values are pure imaginary, so the magnitude of the oscillations depends 
precisely upon the initial conditions. Physically, it makes more sense for 
the model to evolve toward some stable limit cycle, irrespective of the 
initial conditions. 

A simple modification of the Lotka-Volterra model that attempts to 
increase the relevance of the equations is to limit the exponential growth 
of the prey P by introducing a death term corresponding to overcrowding, 


“=4-Z-P-S6Z, (9.17) 


where K is called the “carrying capacity of the environment,” and cor- 
responds to the maximum prey population that can be sustained by the 
environment. Unfortunately, this modified set of equations no longer os- 
cillates. (Excercise 4). Despite the fact that Eqs. 9.17 do not oscillate 
deterministically, they do exhibit noise-induced oscillations! 

Figure 9.2A shows an example trajectory of the predator-prey dynam- 
ics from a stochastic simulation. Figure 9.2B compares the theoretical 
spectrum (Eq. 9.15) to the spectrum extracted from an ensemble of 500 
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stochastic simulations. Obviously, the agreement is quite good. 


9.3. Effective Stability Analysis (ESA) 


e M. Scott, T. Hwa and B. Ingalls (2007) “Deterministic characterization of 
stochastic genetic circuits,” Proceedings of the National Academy of Sciences 
USA 104: 7402. 


Oscillations in the predator-prey model above comes from driving with 
full-spectrum white noise a system with complex eigenvalues. Deviations 
from deterministic behaviour can occur in systems with all-real eigenval- 
ues, as well. Here, the interplay between intrinsic noise and nonlinear 
transition rates leads to a noise-induced loss of stability. Depending upon 
the underlying phase-space, the loss of stability may simply bounce the 
state among multiple fixed points, or, in some cases, generate coherent 
noise-induced oscillations. 

The effective stability approximation is a method by which the effect 
of the noise is used to renormalize the eigenvalues of the system, linearized 
about a deterministic fixed point, providing conditions on the model pa- 
rameters for which stability is lost in the stochastic model. In analogy 
with linear stability analysis of ordinary nonlinear differential equations, 
the effective stability approximation will demonstrate that the system is 
unstable, but cannot say anymore than that — in particular, it cannot be 
used to provide the fluctuation spectrum of the noise-induced oscillations. 

The approximation method combines the linear noise approximation 
of Chapter 4 (page 100) to characterize the fluctuations, and Bourret’s 
approximation from Chapter 8 (page 177) to renormalize the eigenvalues. 
Both will be briefly reviewed below. 

To calculate the stability of the macroscopic model & = f(x) to 
small perturbations, the system is linearized about the equilibrium point: 
X= Xs + Xp, 


d 
axe = JO «x, (9.18) 
The eigenvalues of the Jacobian J) = oe xux, Provide the decay rate of 


the exponential eigenmodes; if all the eigenvalues have negative real part, 
we say the system is locally asymptotically stable. 
To accommodate fluctuations on top of the small perturbation x,, we 


-1 
set x = x, + x» +wa(t), where we have written w= VQ ~ to keep the 
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notation compact. The Jacobian 


will then be a (generally) nonlinear function of the fluctuations about the 
steady-state a(t). In the limit w — 0, we can further linearize J with 
respect to w, 


OJ 


| 
TW 
w—+>0 5) 


a? pel) 


w—+0 


JzJ| 


The stability equation is then given by, 


“x, = [J + wIM(#)] -x, + zero mean terms. (9.19) 


This is a linear stochastic differential equation with random coefficient 
matrix J‘) (t) composed of a linear combination of the steady-state fluc- 
tuations a(t) which have non-zero correlation time (cf. Eq. 9.10). 

Our present interest is in the mean stability of the equilibrium point. 
Taking the ensemble average of Eq. 9.19, 


© (9) = IO) - (xp) +0 (I (8) -2ep) 
The right-most term is the cross-correlation between the process x, and 
the coefficient matrix J“) (t). Since the correlation time of J (t) is not 
small compared with the other time scales in the problem, it cannot be 
replaced by white noise, and an approximation scheme must be developed 
to find a closed evolution equation for (x,). 

By assumption, the number of molecules is large so the parameter 
w is small, although not so small that intrinsic fluctuations can be ig- 
nored. To leading-order in w, the trajectory x, (t) is a random function 
of time since it is described by a differential equation with random coeffi- 
cients. Derivation of the entire probability distribution of x, (t) is usually 
impossible, and we must resort to methods of approximation. We shall 
adopt the closure scheme of Bourret (Eq. 8.20, page 177) to arrive at a 
deterministic equation for the evolution of the averaged process (x, (t)) 
in terms of only the first and second moments of the fluctuations. In that 
approximation, provided J© >> wJ™), the dynamics of (Xp) are governed 
by the convolution equation, 


aa Xe (t)) = Jo (Xp (#)) +u7 [5, (t — 7) (Xp (7)) dr, (9.20) 
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where J.(t—T) = (3 (t)eJ°-) J (7)) is the time autocorrela- 
tion matrix of the fluctuations. The equation can be solved formally 
by Laplace transform, 


(Xp (8)) = [s1- I — w?Ge(8)] Gx (0)), 


t 
where now J. (s) = {J_(t)e~*dt. A necessary and sufficient condition 
0 


for asymptotic stability of the averaged perturbation modes (x, (t)) is 
that the roots ’ of the resolvent, 


det [XI ~ Jp —w5.(0’)| =0, (9.21) 


all have negative real parts (Re(X’) < 0). Some insight into the behavior 
of the system can be gained by considering a perturbation expansion of 
the effective eigenvalues \’ in terms of the small parameter w. We further 
diagonalize J©, diag[\;] = P~!-J©-P, and provided the eigenvalues are 
distinct, we can explicitly write / in terms of the unperturbed eigenvalues 
di to O(w*) as, 


MN =A, +02 [(P rah Ow) <P is (9.22) 


where [ - ]i; denotes the i‘” diagonal entry of the matrix. Notice the 
matrix product J.(t) contains linear combinations of the correlation of 
the fluctuations (a;(t)a;(0)). These are simply given by the exponential 
autocorrelation function derived above, Eq. 9.10. In the next section, we 
will apply the effective stability approximation to a model of an excitable 
oscillator, using the method to construct a phase-plot in parameter space 
illustrating regions of noise-induced oscillations. 


9.3.1 Example — Excitable Oscillator 


e Vilar, J., H.Y. Kueh, N. Barkai, and S. Leibler (2002) “Mechanisms of noise- 
resistance in genetic oscillators,” Proceedings of the National Academy of Sci- 
ences USA 99, 5988-5992. 


We consider the generic model proposed by Vilar and co-workers to 
describe circadian rhythms in eukaryotes, with a transcriptional autoac- 
tivator driving expression of a repressor that provides negative control by 
sequestering activator proteins through dimerization. The repressor and 
activator form an inert complex until the activator degrades, recycling 
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repressor back into the system. In their model, the degradation rate of 
the activator, 6,4, is the same irrespective of whether it is bound in the 
inert complex or free in solution. We simplify their original model some- 
what, and assume fast activator/DNA binding along with rapid mRNA 
turnover, leading to a reduced set of rate equations governing the concen- 
tration of activator A, repressor R and the inert dimer C, 


dA A 
eeey YS : A A: 
dt YA a (gota) OA Ko R 
dR A 
ae ve-a (gfe) Orp-R-Kkc:A-R+6,4:°C 
“ = KoA Ray (9.23) 
Here, the function g, 
1l+fea 
=i 24 
g(2, f) ae (9.24) 


characterizes the response of the promoter to the concentration of the 
regulatory protein A. The fold-change in the synthesis rate, f, is >> 1 
because A is an activator. In this complicated system, there are many 
dimensionless combinations of parameters that characterize the system 
dynamics. The scaled repressor degradation rate « = dr/d,4 is a key 
control parameter in the deterministic model since oscillations occur only 
for an intermediate range of this parameter. For the nominal parameter 
set used in the Vilar paper, the deterministic model exhibits oscillations 
over the range 0.12 < « < 40 (Figure 9.3a, black region). We shall focus 
on the parameter regime near to the phase boundary at ¢ = 0.12 and 
examine the role intrinsic noise plays in generating regular oscillations 
from a deterministically stable system. 

Applying the ESA to the oscillator model, the parameter A,, = 
(b4 + 1)/(2- K4- Vee) emerges as an important measure quantifying 
the discreteness in activator synthesis. Here, b4 is the burst size in the 
activator synthesis (see p. 113), K4 is the activator/DNA dissociation 
constant and V.eu is the cell volume (with View = 100m? as is appropri- 
ate for eukaryotic cells.) 

The nominal parameter set of Vilar leads to a burstiness in activator 
synthesis of b4 = 5 (giving A,, = 6 x 10~?) and a burstiness in repres- 
sor synthesis of bk = 10. The phase boundary predicted by the ESA 
is shown as a solid line in Figure 9.3a, bounding a region of parameter 
space between the deterministic phase boundary where qualitatively dif- 
ferent behavior is expected from the stochastic model. We examine the 
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Figure 9.3: Oscillations in the excitable system. A) Phase plot 
shows a region of noise induced oscillations. B) Stochastic simulation of 
a deterministically stable system (black line; denoted by a cross in the 
left panel) shows oscillations (grey line). 


system behavior in this region by running a stochastic simulation using 
the parameter choice « = 0.1 and Ay, = 6 x 10~? (denoted by a cross 
in Figure 9.3a). With this choice, the deterministic model is stable (Fig- 
ure 9.3b, black line). Nevertheless, a stochastic simulation of the same 
model, with the same parameters including protein bursting and stochas- 
tic dimerization, clearly shows oscillations (Figure 9.3b, green line). 


Suggested References 


Beyond the references cited above, Keizer’s paradox is discussed by Gille- 
spie in his seminal paper on stochastic simulation: 


e D. T. Gillespie (1977) “Exact simulation of coupled chemical reactions,” Journal 
of Chemical Physics 81: 2340. 


He takes the view that the stochastic simulation trajectories show very 
clearly the immense time scales involved before the system reaches the 
anomalous stability of the fixed point at the origin. 

Another method of characterizing noise-induced oscillations is the 
multiple-scales analysis of Kuske and coworkers: 


e R. Kuske, L. F. Gordillo and P. Greenwood. (2006) “Sustained oscillations via 
coherence resonance in SIR,” Journal of Theoretical Biology 245: 459-469. 
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Like the method of McKane and Newman discussed in the text, the 
method of Kuske allows the noise-induced oscillations to be fully char- 
acterized in terms of fluctuation spectra and oscillation amplitude. 


Exercises 


1. Keizer’s paradox: Stochastic models often exhibit behaviour that 
is in contrast to their deterministic counter-parts. In Section 9.1, 
we discussed a particular autocatalytic network. The details left 
out of the main text will be filled in below. 


a) Use linear stability analysis to prove the claim that the equi- 
y yi 
librium point X** = 0 is unstable in the deterministic model. 


(b) Derive the master equation that corresponds to the autocat- 
alytic network shown in Eq. 9.1. What is the explicit form of 
the transition matrix W appearing in Eq. 9.5? 


(c) Write a stochastic simulation of the example discussed in the 
text. Initializing the system at the macroscopic equilibrium 
point, show sample trajectories to argue that for all intents 
and purposes, the macroscopic equilibrium point is stable. Do 
this for various choices of the system size. 


2. Derivation of the multivariate frequency spectrum: Derive 
Eq. 9.12 by taking the multivariate Fourier transform of the corre- 
lation function B(t). Notice that B(t), as written in Eq. 9.10, is 
valid for t > 0. Find the expression for the autocorrelation function 
for t < 0 in order to be able to compute the Fourier integral over 
the interval t € (—co,0o). Use the fluctuation-dissipation relation 
to put the spectrum in the form shown in Eq. 9.12. 


3. Resonance in driven linear systems: Some systems can be 
excited into oscillation through periodic forcing. 


(a) Damped harmonic oscillator. Solve 


—+(atiby=e (a>0,i=V-1), 


with an arbitrary initial condition. Show that in the limit 
t — oo, the influence of the initial condition vanishes, and the 
amplitude of y is maximum when b = w. 
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(b) 2D Langevin equation. Consider the harmonic oscillator driven 
by white noise 7(t), 
dy . 
et at th) y =n) (a >0). 
Write out the white noise as a Fourier series. What can you 
say about the t > oo amplitude maximum of y in this case? 


4. Predator-prey models: In the predator-prey models discussed in 
Section 9.2, several details were suppressed — these will be made 
more explicit below. 


(a) Find the fixed points of the Lotka-Volterra model, Eq. 9.16, 
and determine the stability of the system linearized about those 
fixed points. What can you say about stochastic trajectories 
in the predator-prey phase space? 


(b) Find the fixed points of the modified Lotka-Volterra model, 
Eq. 9.17, and determine the stability of the system linearized 
about those fixed points. What can you say about stochastic 
trajectories in the predator-prey phase space for this model? 


(c) For what choice of parameters would you expect noise-induced 
oscillations in the modified Lotka-Volterra model (Eq. 9.17)? 


5. Noise-induced oscillations in the Brusselator: In Section 4.2 
on p. 88, the Brusselator model was introduced, 


) => X41, 
OK Ky 3G. 


bee ee 
X41 7 0, 


corresponding to the deterministic rate equations, 


dX 

ore =1 + aX? Xs — (b+ 1)X,, 
dX 
“dt. = —aX?Xo + bX,. 

Compute the power spectrum for the fluctuations about the steady- 

state for (X7), (XY) and (Y?). What are the conditions for noise- 


induced oscillations in this model? 
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Figure 9.4: Exercise 6 — The autoactivator. A) Schematic of the 
positive feedback loop. B) More detailed view of a genetic circuit that 
encodes a protein that activates its own rate of transcription. C) The 
activation function g([A]) increases (nonlinearly) with increasing activator 
concentration. 


6. Effective stability approximation of the autoactivator: The 
vector-matrix formalism of the effective stability approximation leads 
to unweildy expressions for the effective eigenvalues in systems with 
several state variables. The one-species autoactivator model, by 
contrast, allows the approximation to be computed explicitly. 


The autoactivator is a genetic regulatory motif with a product A 
that stimulates its own synthesis through a positive feedback loop 
(see Figure 9.4). Each transcription event triggers the translation of 
b activator proteins, where b is the burst parameter (see Section 5.2). 
The deterministic rate equation for this system is, 


dA 
a TA) — 8°, 


where, 


n 
sya SEEMED 
A) 

(a) Drawing the synthesis and degradation rates on the same log- 
log plot, decide how many stable fixed points the deterministic 
system has. What is the eigenvalue of the system? Identify the 
dimensionless combinations of parameters that fully character- 
ize this system, and draw a phase plot showing the boundaries 
where the number of stable fixed points changes. Hypothesize 
about what noise will do to this system. 
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(b) 
(c) 
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Write out explicitly the propensity vector v and the stoichiom- 
etry matrix S for the stochastic version of this model. 


Write out the diffusion matrix D = S-diag[v]-S", and identify 
an additional dimensionless combination that characterizes the 
stochastic behaviour of the system. Provide a physical inter- 
pretation of this parameter. 


Compute the effective eigenvalue, written in terms of the di- 
mensionless parameters and without writing out g(A) explic- 
itly. What can you say about the noise-induced correction? 


CHAPTER LO 
—_——} 


SPATIALLY VARYING SYSTEMS 


In previous chapters, we have considered Markov processes that converge 
to systems of ordinary differential equations in the deterministic limit. A 
natural extension of this analysis is to models exhibiting spatial variation, 
so-called random fields. An example are Markov processes that converge 
to a system of partial differential equations in the deterministic limit. 


10.1 Reaction-transport master equation 


It is again useful to distinguish between partial differential models with 
some stochastic features either in their coefficients (multiplicative extrin- 
sic noise) or their forcing (additive extrinsic noise), as compared to models 
whose dynamics are driven by stochastic events (intrinsic noise). For in- 
trinsic noise systems modeled with a Markov process, the dynamics are 
characterized by a master equation that includes stochastic reaction and 
transport. 

Recall that for a smooth conserved scalar field p(Z,t), the dynamics 
take the form 


Op(Z, t) _ Tr > 
coo aaa —-VJ + f(@,t), 


where J is the flux and f(Z,t) are sources and sinks. As an example, 
suppose we have a chemical species with density n(Z,t) transported via 
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diffusion, 


On 
— = DV*n, 
Ot 
where the flux is given by Fick’s law: J = —DVn. 
Suppose that n is also subject to constant synthesis and linear degra- 
dation. These reactions serve as a local source and sink for the reactant 
which is globally transported via diffusion, 


nue [a — Bn] + DV? n. (10.1) 
Ot 

This is an example of a reaction-diffusion equation which finds application 
to a diversity of physical phenomena: chemical reactions, predator-prey 
ecosystems, tumour growth, ... 

Two question present themselves: First, how do we write out a master 
equation characterizing stochastic reaction and transport? Second, how 
can we derive approximate moment equations from such an equation? 

The idea is simple — 


e Break the spatial domain into subvolumes, with transport occurring 
randomly between subvolumes. Index the subvolumes in such a 
way that transport is modeled as one of the many possible reaction 
events. 


e Within each subvolume, reactions occur stochastically in a spatially- 
homogeneous manner. 


e Solve the flux in the first- and second-moments due to transport 
and reactions individually then combine the results (method of com- 
pounding moments). 


e Remove the artificial discretization of the spatial domain by taking 
the continuum limit of the equations, 7.e., take the limit of vanishing 
subvolume size. 


The total conditional probability taken across all subvolumes 4), 
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Each subvolume is a 
spatially-homogeneous 
reaction vessel 


Transport between subvolumes: 
(W7 41,0)" (v2, NH +1) 


Figure 10.1: Subdivision of the total volume. The total volume is 
subdivided into subvolumes of size Q. The number of particles in each 
subvolume ) is N*. We assume each subvolume is small enough that it is 
a well-mixed, spatially-homogeneous reaction vessel; however, we assume 
N® is large enough that the van Kampen approximation can be applied 
within each subvolume. 


P ( {n>} | {No} to). obeys the compound master equation, 
r r 


ap ({N>} 2) 
gE eh = 
ot 
EE wlan? (OP), 0) wt PL, 0) 
r AP r r 
Local Changes due to Reactions 


+ Dump ({1} +) —whsp ({24}, 2). 


Transport Between Subvolumes 


LNA for spatial systems 


The full master equation is too cumbersome to analyze directly. We will 
compute the first- and second-moments for the reaction terms alone, then 
combine with those derived from the transport. 

Ignoring transport, each subvolume becomes a closed system with in- 
ternal dynamics independent of all other subvolumes. That is, 


»(E°},.0) = TT (9). 
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As in the spatially-homogeneous case, we make the ansatz, 
N} — Oy; + VQ02 


where 2 is the size (volume) of the subvolume (we are using y as the 
concentration in order that « be reserved for position). The zero’th order 
term in the resulting expansion is the local deterministic change in y*(t), 


dy” Dig api aan 

Ay ; 10.2 

a a) (10.2) 

and for the fluctuations, the conditional distribution II* (a,t) obeys a 

linear Fokker-Planck equation, 

3) (aI) Di. 6?IP 
Ja> 2 dasa} 


(10.3) 


r 
[Dm (e")| SE = | TP oe o,)] J 


Here, and throughout the remainder of this chapter, summation over re- 
peated indices is implied. The first- and second-moments can be calcu- 
lated by multiplying Eq. 10.3 by a and aja], respectively, and integrating- 
by-parts over all a = {...,a@*~!,a*,a**!,...}. Introducing the short- 
hand, C#" = ((at. (a")")), the dynamics of the average and the covari- 
ance are then given by, 


dial) 
dt 
=1. CH" + (F7.C#7)" + 5,,D". (10.4) 


=M(a*), 
darn 
dt 


The resulting expressions are very similar to what was derived in the 
spatially-homogeneous case, with the exception that C’” is not necessarily 
symmetric; that is, in general, 


((apag)) # ((ayay’)). 


Furthermore, the Kronecker-delta 4,,,, ensures that uncorrelated subvol- 
umes remain uncorrelated. Of course, when transport is included that is 
no longer true — fluctuations at different locations can become correlated 
due to transport of the noise. 

The discretization of space implied by the -superscripts is a formal 
device used to facilitate the derivation; in practice, we would like to take 


Spatially varying systems 221 


the continuum limit of the individual subvolume dynamics. With 

N 

O°” 

where x is the position of the center of the A-subvolume, we take the limit 
Q > 0, with lime, dy,/Q = 6(x1 — x2). In that limit, the average and 


covariance Eqs 10.2 and 10.4 reduce to the system of non-autonomous 
ordinary differential equations, 


dn(x)) _ 
di =S-v((n(x)),x), 
OC (2) _ 1 (oe1)  C (oc 2¢2) + [F (oe) + © (01,2)? 


+6(x1 — X2)D (x). (10.5) 


Remark: It appears as though we have taken two contradictory limits 
for the size of the subvolumes: 2 — oo in the van Kampen approximation 
leading to Eq. 10.3 and Q —> 0 in the continuum limit! 

So long as these limits can be taken asymptotically, there is no con- 
tradiction. What we need is for 2 to be large enough that ‘jumps’ in 
local density are nearly infinitesimal, and 2 to be small enough that the 
characteristic length scales of change for (n(x)) are long compared to 1. 
Said another way, we assume there is a separation of scales, 


Volume over which Volume over which 
changes in local density max ||V (nj (x)) || 0 
; KK i 
are appreciable when i.e. (n(x)) changes 
a reaction occurs. appreciably. 


Stochastic transport 


In many ways, the transport part of the master equation is more straight- 

forward to deal with, so long as individual transport events are inde- 

pendent of all other transport and reaction events. (If there is density- 

dependent transport, then the van Kampen approximation of the trans- 

port is necessary — see, for example, C. A. Lugo and A. J. McKane (2008) 

Quasicycles in a spatial predatorprey model. Phys. Rev. E 78:051911.) 
The transport part of the compound master equation is, 


PU 5c (5,8) ar} 


Le 
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The transition probabilities will be proportional to the subvolume size 
Q: wy > Quix" . Multiplying the transport master equation by N“ and 
summing over all {N Au oi 


d(N*) 
dt 


= a wit(N*) — wit(N#). 
oN 


Introducing the short-hand operator W,,, = wan — bur ( sae } with 


summation over repeated indicies implied, 
d(N*) 

dt 
In principle, the second-moment equations could be derived similarly. 


The resulting expressions are greatly simplified, however, if we express 
the second moment in terms of the factorial cumulant, 


= QW,,(N?). (10.6) 


Ke" = [NH (N")"] = ((N*(N")")) — Sundiag [(N“)] 
Or, in the continuum limit, 
K (x1, X2) = QC (x1, x2) — 6 (x1 — x2) diag [(n(x))] . (10.7) 


The factorial cumulant has an analogous relation to the Poisson process 
as the ordinary cumulant has to the Gaussian: specifically, all factorial 
cumulants after the first vanish for a Poisson process (see Exercise 3). 

As a consequence, the transport of the factorial cumulant is quite 
simple, 


dken 
dt 


In the continuum limit, the discrete operator is replaced by the continuous 
operator, 


= QW,,,K*" + QW, K" 


QS 7 Wren fw (x|x’) e dx’, 
Xr 


so that the equations for the moments become, 


O(n(x)) / Keg 
Fee = f Wx) (n(x) 
s= = [ Weal) KO 29) + W (xx) K(x, x”)dx! 
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So far, no assumptions have been made about the specific form of W. 
If, for example, the jumps are isotropic and small compared to the length 
scale of variation in (n(x)), then the discrete operator converges to the 
diffusion operator, 


QS >We MV’e, 
aN 


where M = diag [d1, do, ...] is the diagonal matrix of diffusion coefficients. 
In general, the operator W will converge to a linear differential operator 
L(x) that characterizes the transport in the deterministic equations, 


A(n(x) 
at 


= L(x) - (n(x). 


The transport of the factorial cumulant is then written as a term-wise (or 
Hadamard) product with an operator £(x), x2) defined in terms of L(x), 


[L(x1, X2) 0 K(x1, X2)],; = [Lii(x1) + £55 (%2)| Kij(x1,x2), (10.8) 


so that, 


OK(x1, x2) 


At = £(x1,X2) o K(x, x2). 


Combining reaction and transport 


The dynamics of the moments computed via the flux due to reaction and 
transport separately are given by the following contributions. 


e Contributions due to reaction flux, 


d(n(x)) _ 
ane =S8-v((n(x)),), 


=I (x,)-C (x1, x2) + [PF (x2) - C (x1, x2)]” 
+6(x 1 — X2)D (x). 


dC (x1 ; X2) 
dt 


e Contributions due to transport flux, 


(n(x) 

ot 

OK (x1 3 X2) 
ot 


= L(x) - (n(x), 


= £(x1,xX2) 0 K(xj, x2). 
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To combine the first-moment equations is straight-forward, 


Paco) = S-v((n(x)),x) + L(x) - (n(x). (10.9) 


To combine the second-moments, either C must be converted to K, or 
vice versa. To keep the transport simple, C is written in terms of K, 


Se =T (x,)-K (x1, x2) + [[ (x2)-K (x1, x2)]7 + 5(x, — X2)D (x) 


+2£(xX1, 2) 0 K(x), X2) — 6(x1 — X2) [diag [S - v ((n(x1)), x1)]] 
—6(X1, X2) [r (x1) - diag [(n(x))] + diag [(n(x))] -P7 (x1)] . (10.10) 


The expression looks complicated, but reduces considerably in applica- 
tion. 


10.1.1 Stochastic simulation 


e J. Hattne, D. Fange and J. Elf (2005) “Stochastic reaction-diffusion simulation 
with MesoRD,” Bioinformatics 21: 2923-2924. 


e@ D. Fange, O. Berg, P. Sjoberg and J. Elf (2010) “Stochastic reaction-diffusion 
kinetics in the microscopic limit,” Proceedings of the National academy of Sci- 
ences USA 107: 19820-19825. 


Conceptually, the most straightforward simulation of a spatially-inhomo- 
geneous model is to explicitly invoke the discretization used the previous 
sections (Fig. 10.1). Species in separate subvolumes are labeled as dis- 
tinct, and the Gillespie simulation algorithm then takes in all species in 
all subvolumes as potential reactants, allowing reaction within a subvol- 
ume or transport between subvolumes as admissible reaction events. For 
large numbers of potential reactants, the simulation time can become 
prohibitive, but there are methods for accelerating the simulation (some 
examples are given in the references at the end of the chapter). 

The algorithm is well-illustrated by example. Consider a stochastic 

model of the reaction-diffusion Eq. 10.1, with linear reaction events, 
on = [a — Bn] + DV? n. 
We will confine ourselves to a model in one spatial dimension. Assuming 
unit reaction stoichiometery, within each subvolume one of four reactions 
is possible: synthesis at a constant rate a, degradation at a linear rate 
6n, diffusion to the left and diffusion to the right. 
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To convert the continuous diffusion coefficient to a discrete stepping 
probability, we are motivated by the Smoluchowski model of Brownian 
motion (section 1.2.2 on p. 9), and model the transition rate to diffuse 
to the right or to the left as vpiff = D/Q?n>*. The reaction propensity 
vector within each subvolume is then, 

T 
y= [a, Bn*, D/N?n*, D/Q?n*] 
Due to diffusive transport, the stoichiometry matrix associated with re- 


action events involving n* now couple to the neighbouring subvolumes to 
the right and to the left, 


S*= (10.11) 


The stochastic simulation including all subvolumes is driven by the com- 
posite propensity vector v = F Te aaa ee | with stoichiometry 
matrix formed by joining together the individual stoichiometry matrices, 


Eq. 10.11, as, 


A- A- A- A- a a a a 
Vv; 1 Vv; 1 Vv; 1 vi 1 V; V; ss v vin vi vin vin 


It should be clear that n* can increase by a synthesis reaction in subvol- 
ume 4X, or diffusion from subvolumes \ +1. Likewise, n* can decrease by 
a degradation reaction in subvolume \, or diffusion to subvolumes A + 1. 

The connectivity in the first and last subvolumes determine the bound- 
ary conditions on the transport. For example, in a one spatial-dimension 
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model with 3 subvolumes (arranged in increasing order from left to right) 


and periodic boundary conditions, the stoichiometry matrix generated by 
Eq. 10.11 is, 


Transport from n° to n' 


Transport from n' to n° 


In the case of reflecting (or zero-flux or Neumann) boundary conditions, 
the stoichiometry matrix becomes, 


No transport to the left 


No transport to the right 


Written this way, there are two spurious reactions (vj,v?). In an op- 
timal simulation routine, these two entries would be removed from the 
composite propensity vector, along with their associated columns in the 
stoichiometry matrix. For absorbing (or Dirichlet) boundary conditions, 
it is most convenient to introduce a phantom subvolume at each end of 
the domain into which particles can move, but can never leave. 


Example — Spatially-homogeneous steady-state 


Consider pure-diffusion, in the absence of reactions. For a scalar field 
n(x), the average obeys, 


A(n(a)) _ , (n(x) 
Ot = Ox? 
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From Eq. 10.10, the second factorial cumulant obeys the analogous equa- 
tion, 


Stent _ p f+ gazh in(er)nea)]. (10.12) 


Figure 10.2 shows how the average (black solid0 and one-standard devia- 
tion envelope (dashed line) evolve as time passes. The spatial dimension 
is discretized into 32 subvolumes, and the system is initialized with 50 x 32 
particles in the center subvolume (to give an average density of 50 par- 
ticles per subvolume). The grey solid line is the result of a stochastic 
simulation. 

What is the meaning of the standard deviation envelope (dashed line) 
in Fig. 10.2 within the context of the concepts we have seen in previous 
chapters? Recall that Einstein proposed the diffusion equation as the 
governing equation for the number density of an infinite collection of 
identical, but independent, Brownian particles. The dashed curves in 
Fig. 10.2 illustrate the expected error when a finite collection of identical 
and independent particles are used to infer the number density function, 
i.e. the ideal experimental error Perrin should expect in his experiments 
when validating Einstein’s theory (see Section 1.2.1, page 8). 

In analogy with the examples of previous chapters, the local fluctua- 
tions in density scale roughly as the square-root of the number of particles. 
Figure 10.3 illustrates how the local variances changes when the averaged 
density is varied. 
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Figure 10.2: Spatiotemporal variance for diffusive transport. For 
a system initialized with a Kronecker delta at the center subvolume, 
the average (solid black) evolves as a Gaussian. The standard devia- 
tion (dashed black) is computed using the factorial cumulant, Eq. 10.12. 
A. V2Dt =0.1. B. V2Dt = 0.15. C. V2Dt = 0.2. D. V2Dt = 1. The 
thick gray curve is a realization of a stochastic simulation. The simulation 
is done on a lattice of 32 subvolumes, with an average number density of 
50 and periodic boundary conditions. The diffusion coefficient is D = 0.1. 
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Figure 10.3: Steady-state variance for diffusive transport. Chang- 
ing the average number density affects the relative magnitude of the vari- 
ance. A. Average number density is 50. B. Average number density 
is 100. C. Average number density is 10. The relative magnitude of the 
fluctuations about the steady-state is proportional to the reciprocal of the 
square-root of the number of molecules. The thick gray curve is a realiza- 
tion of a stochastic simulation. The simulation is done on a lattice of 32 
subvolumes subject to periodic boundary conditions. Other parameters 
as in Fig. 10.2. 


Example — Spatially-inhomogeneous steady-state 


e E. Levine, P. McHale and H. Levine (2007) “Small regulatory RNAs may 
sharpen spatial expression patterns,” PLoS Computational Biology 3: 2356. 


e M. Scott, F. J. Poulin and H. Tang (2011) “Approximating intrinsic noise in 
continuous multispecies models,” Proceedings of the Royal Society A 467: 718. 


The method of compounding moments allows spatially-inhomogeneous 
steady-states to be accommodated without much effort. As an example, 
we will consider the mechanism proposed by Levine et al. to sharpen 
spatial profiles in developing embryos through the action of a small reg- 
ulator molecule. Denoting the regulatory molecule by uz, and the target 
by m, the model assumes that the small regulatory molecule alone dif- 
fuses, and that the interaction between the two species results in mutual 
annihilation, 


i) 
va = a,(%) — Bap —- Kmp 4 Dra (10.13) 
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Here, 8 and 6, are the linear degradation rates of the target and regu- 
latory molecules, respectively, « is the interaction parameter and D is the 
diffusion coefficient of the regulatory molecule. The synthesis rate are spa- 
tially varying , am(x) and a,(a), and are assumed to be anti-correlated 
sigmoidal functions, 


(0.5 — x) 


Qm(x) = tanh | 03 


+ | and a,,(x) = ; tanh os 
where the spatial domain length L has been scaled so that 0 < a < 
1. Under conditions of strong interaction, Kam(0)/Bm >> D/L, solving 
Eq. 10.13 subject to reflecting (Neumann) boundary conditions results in 
a target profile, m(a,t) that exhibits a very sharp transition from high- 
to-low expression states. 

With the analysis derived in the previous section, it is possible to 
compute the effect fluctuations have on the variance of the interface posi- 
tion over an ensemble of realizations (i.e, how the fluctuations affect the 
accuracy of the sharp transition position). Using Eq. 10.10, the system 
of partial differential equations governing the factorial cumulants K is 
simply, 


0 DE 
ee 0 Km(x1) (21) 


where 


see | 


_ | [rm(t1)Mm(t2)]  [Mm(#1) ru (e2)] 
Kriz) = | [Mu (@1)Mm(e2)) [Mp (@r)rp(x2)] | 


2 


and o is the component-wise Hadamard product defined in Eq. 10.8. Sub- 
stituting the mean-field solution, Eq. 10.13, into the coupled 2D Poisson 
equations for K, the system is solved with absorbing (Dirichlet) boundary 
conditions. From the definition of the factorial cumulant, Eq. 10.7, the 
standard deviation about the averaged state is then computed 
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Figure 10.4: Fluctuations about a_ spatially-inhomogeneous 
steady-state. Levine, McHale and Levine have proposed a mechanism 
based on small regulatory molecule interactions to generate a sharp in- 
terface in a target m(ax,t) during development. The location of the sharp 
interface in this model is surprisingly robust to intrinsic fluctuations, 
even for small molecule numbers. The mean level is shown as a black 
solid line, while the standard deviation envelope is denoted by the dashed 
curves. The gray solid line is a sample realization from stochastic sim- 
ulation. A. The nominal parameter set of Levine, McHale and Levine 
(8m = 6, = D = 0.01; with 100 subvolumes) — the maximum target level 
is about 200 molecules per subvolume. The location of the half-maximum 
target level is 0.282 + 0.003 (rel. error < 1%). B. Decreasing the maxi- 
mum target level two-fold, to 100 molecules per subvolume, the location 
of the half-maximum target level is 0.282 + 0.004 (rel. error 1.4%). C. 
Decreasing the maximum target level ten-fold, to only 10 molecules per 
subvolume, the location of the half-maximum target level is 0.28 + 0.01 
(rel. error 3.7%). The fluctuations about the high-state are approximately 
Poisson, but nevertheless, the variance about the threshold location is 
negligibly small. 
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10.1.2 Frequency-domain analysis 


As we saw in Section 2.3, the Fourier transform or ‘spectrum’ of a stochas- 
tic function can reveal a great deal of information that is obscured in the 
untransformed signal. The same is true of stochastic fields — here, with 
the increased dimensionality, the conjugate transforms are from (time, 
space) to (frequency, wavenumber). We will examine the spectrum of 
the two-point spatial correlation function. In the same way that a peak 
in the frequency spectrum of a stochastic process indicates a dominant 
mode contributing to temporal patterning (i.e. oscillations), a peak in 
the wavenumber spectrum of a random field indicates a dominant peri- 
odic spatial patterning. 


Spectrum of the space-space correlation function 


e A.M. Turing (1953) The chemical basis of morphogenesis Phil. Trans. R. Soc. 
Lond. B 237: 37. 


Pattern formation is ubiquitous in physical, chemical and biological 
systems. One mechanism through which they can arise is due to the evo- 
lution of a deterministically unstable steady state. For example, Turing 
showed that for a reaction-diffusion model tending to a homogeneous equi- 
librium state, diffusion can act to destabilize the steady solution. More- 
over, the system becomes destabilized to only a certain range of spatial 
modes, leading to the emergence of regular patterning. We explore this 
mechanism in more detail below. 

In a strictly deterministic system, the local stability is characterized 
by the evolution of some small perturbation, x,, about the equilibrium 
state. For sufficiently small amplitudes, the perturbation field obeys the 
linearized mean-field equation, 


OX 


at =A-x,+D-V°xp, 


where A is the Jacobian of the reaction dynamics and D is the diffusion 
matrix. Taking the Laplace and Fourier transforms in time and space, 
respectively, the stability of the equilibrium state is determined by the 
resolvent equation, 


det [AI— A + k*D] =0. 


The equilibrium is asymptotically stable if Re[A] < 0. Even though A 
may be stable (i.e. the eigenvalues of A lie in the left-half of the complex 
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plane), Turing’s claim is that for a certain class of diffusivity matrices 
D and a range of wavenumbers k, the roots of the resolvent equation A 
are shifted to the right-hand side of the complex plane and the system 
becomes unstable. 

The analysis is made more transparent by considering a two-species 
model in one spatial dimension. For a two-species model, the criteria for 
A to have stable eigenvalues are, 

trA <0> ay, +422 < 0, and 
det A > 0 > aj1G22 — aj2a21 > 0. 


The presence of diffusion introduces a re-scaling of the diagonal elements 
of A, 


a1 > G11 = ay, — Dyk’, 
a22 > a2 = a22 — Dok?. 
The conditions for stability then become, 
tr [A —k’D] <0, 
det [A — k?D] > 0. 
For diffusion to destabilize the steady-state, it must be that, 
det [A = k?D] <0 => G41G22 — aj2a21 < 0, (10.15) 


since the condition on the trace is automatically satisfied. The above 
equation can be written explicitly as a quadratic in k?, 


Dy a2 + D2ay1 ke 441422 — 412491 


<0. 
D,Dsz D, D2 


QO(k*) = k4 


A sufficient condition for instability is that the minimum of Q(k?) < 0. 
Setting the derivative of Q to zero, we arrive at an explicit expression for 
k? ,, that minimizes Q(k?), 
joe Dya22 + Doar 
min DED 


Finally, the condition for Turing-type instability can then be written as, 


(Di a22 + Dear)” 
4D, D2 


Q(k? in) <0 => det A < (10.16) 
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The range of parameter space for which A is stable, but Eq. 10.16 is 
satisfied, is called the Turing space. It is in this parameter regime that 
the system is unstable to small perturbations and able to form regular 
patterning. 

The two major limitations associated with applications of Turing-type 
pattern-formation in the modeling of natural systems are that, 


1. The difference in the diffusion coefficients D; and D2 must be quite 
large (Dy > Dy;), typically at least an order of magnitude, to satisfy 
the necessary conditions for pattern formation. In reality, unless one 
of the species is immobilized, the diffusion coefficients are rarely very 
different. 


2. The instability that arises is periodic with some characteristic wavenum- 
ber close to kmin. In reality, spatial patterning in natural systems 
exhibit irregular patterning, leading to a distribution in the spec- 
trum spread across a range of wavenumbers. 


We shall see below that both objections hold for deterministic models, 
and that once intrinsic fluctuations in the populations are admitted, the 
severity of both limitations is greatly reduced. 


Turing instabilities in stochastic systems 


e Y. Kuramoto (1973) Fluctuations around steady states in chemical kinetics 
Progress of Theoretical Physics 49: 1782. 


Noise-induced pattern formation arises in a given system for param- 
eters outside of the Turing space. These patterns are revealed as peaks 
in the spatial correlation spectrum. This particular mechanism of noise- 
induced spatial patterning is analogous to the ‘resonant amplification’ 
mechanism underlying temporal instabilities of spatially-homogeneous sto- 
chastic models (see Section 9.2 on page 203), and can be quantified 
through a local analysis about the equilibrium point. In stabilities aris- 
ing from excitable dynamics require a more global analysis, and are not 
amenable to the methods described in this section. The spatial correlation 
spectrum is related to the Fourier Transform of the factorial cumulants, 
K. Eq. 10.10 governs the factorial cumulants for all time; however, the 
linearity of the moment equations allows particularly convenient evalua- 
tion of the steady-state fluctuations in the system. If the deterministic 
system approaches a spatially-homogeneous steady-state, then K(x1, x2) 
becomes a function of spatial separation K(x1,x2) — K(x,x), where 
xX = |x; — Xg|. Furthermore, the factorial cumulant becomes a symmetric 
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matrix, and the spatial dependence disappears from the coefficients, 7.e., 
T'(x,) =I (x2) =T, reducing Eq. 10.10 to, 


T-K+K-I7 +6(x) x {S- diag[v] -S7 — diag[S - v]} 


+ 6(x) x {I’- diag[(n)] + diag[(m)] - 7} + 2L(x, x) [K(x,x)] = 0. 
(10.17) 


The presence of the delta function and the linearity of the constant- 
coefficient partial differential equation yields a simple expression for the 
Fourier transform of the factorial cumulant K(k). As in the analysis 
above, we focus on a one-dimensional two-species model with diffusion 
characterized by the diffusivity matrix, 


ey 30 
p=| |: 


and therefore, 


The Fourier Transform of Eq. 10.17 results in a system of linear algebraic 
equations, 


re Kh) + KR)? | 2D) (Di + Do) Kia (R) 


(Dy + Do) E12 (k) 2D2K 9 (k) mee 


o] 


(10.18) 


where the terms multiplied by the delta function are represented by the 
constant matrix, 


F =S.-diag|v] -S? — diag[S - v] + T- diag[(n)] + diag[(n)] -T7. 
Notice that for a one-species system, 


, _F 
K(k) = se yap" 


describing an exponential correlation function, peaked at k = 0, with 
characteristic decay length « = ,/D/2T, sometimes called the Kuramoto 
length (after Y. Kuramoto, as suggested by N. G. van Kampen). In a 
multi-species system, Eq. 10.18 could be used to determine the character- 
istic length scale of the fluctuations that in turn is useful in approximate 
accelerated stochastic simulation algorithms. 


236 Applied stochastic processes 


To illustrate the utility of the method, in the next section we consider 
in detail a model exhibiting both deterministic and stochastic pattern 
formation. We compare the results from numerical simulations to the 
analytic expressions for the first and second moments of the fluctuations 
about the equilibrium solution. The Fourier Transform of the factorial 
cumulant is used to construct a phase diagram that maps out the different 
regions of parameter space, including regions of noise induced pattern 
formation. 


Example — Noise-induced patterning 


e A. Gierer and H. Meinhardt (1972) A theory of biological pattern formation 
Biol. Cybern. 12: 30. 


e A. J. Koch and H. Meinhardt (1994) Biological pattern formation: from basic 
mechanisms to complex structure. Rev. Mod. Phys. 66: 1481. 


A simple network that can be used to illustrate the effect of intrin- 
sic fluctuations on pattern-forming instabilities is the activator-inhibitor 
model of Gierer and Meinhardt. Rescaling time and space leads to a min- 
imal toy model given by the following system governing the concentration 
of activator A and inhibitor H, 


2 


Ot H 
a = py (A*— H)+ DyV7H. (10.19) 


The reaction part of the dynamics admits a single steady-state (denoted 
by the superscript ‘ss’), 


AW’ =(l+oy,), H* =1+4+20,4(1+4+ 04), 


whose stability depends upon the two control parameters a4 and pz, with 
asymptotic stability guaranteed for 


(1—o,) 


area (10.20) 


PH > Pc = 


The model is temporally unstable for py < p,, which is depicted as the 
hatched region in Fig. 10.5. The condition for the appearance of Turing 
instabilities is, 

(1+04) (3 Loy KOT 74)) 


(ca —1) 


Dy > De = pu , (10.21) 
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The Turing space for this model as a function of the inhibitor degradation 
rate pz is depicted as the solid black region in Fig. 10.5. As the reaction 
kinetics become more stable (p47 > p-), larger disparity in the diffusivity 
Dy is required to generate pattern-forming Turing instabilities. Notice 
that if the steady-state A**(x) exhibits stable pattern formation, then 
the synthesis term in the dynamics of H in Eq. 10.19 necessarily become 
spatially-dependent. In that way, a Turing instability in only one reactant 
is impossible for the deterministic case. 

The stochastic analogue of the deterministic system, Eqs. 10.19, re- 
quires specification of the stoichiometry matrix, S, and the propensity 
vector, v. For simplicity, we assume the stoichiometry matrix consists of 
unit steps. Enforcing periodic boundary conditions in space, with subvol- 
umes of unit length, Fig. 10.6 illustrates the behavior of the model for the 
parameter choices indicated in Fig. 10.5. For parameters in the region of 
deterministic instability, the stochastic simulation shows regular tempo- 
ral oscillations displayed as vertical bands in Fig. 10.6A. In contrast to a 
simulation of the deterministic model (not shown), the peak heights are 
nonuniform across the spatial domain, and there is slight irregularity in 
the period of oscillation. Fig. 10.6B shows the result of a simulation in 
the Turing space of the deterministic model. Here, the stochastic simu- 
lation shows regular spatial patterning displayed as horizontal bands. As 
in Fig. 10.6A, there is some irregularity in the wavelength of the pattern, 
and movement of the edges in time. In all, for parameters chosen from 
deterministically unstable regions of the phase plot, the stochastic simu- 
lation and the deterministic model are in qualitative agreement (so long 
as the fluctuations remain subdominant to the mean field behaviour). 
Of interest are those parameter choices for which the behaviour of the 
stochastic and deterministic models are no longer in agreement. 

Close to the Turing space of the deterministic model, the steady-state 
is only weakly stable (Fig. 10.5, dark grey region). The action of the 
fluctuations brought about by the discrete reaction events and diffusive 
transport are enough to destabilze the system and allow spatial patterns 
to form (Fig. 10.6C). Compared to the spatial patterns in Fig. 10.6B, 
the boundaries are far more ragged and the wavelength of the pattern is 
distributed over a range of values. Nevertheless, there is obvious pattern 
formation in a parameter regime where the deterministic model is asymp- 
totically stable. Fig. 10.6D shows an example simulation from a region 
of the parameter space demonstrating a surprising difference between the 
stochastic and deterministic models (Fig. 10.5, light grey region). Here, 
the spatial pattern is less distinct than in Fig. 10.6C, though still ob- 
servable. What is remarkable is that patterning in the activator is not 
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Figure 10.5: Phase plot for the activator-inhibitor model, Eq. 10.19. Here, 
and throughout, 04 = 0.1. The deterministic model exhibits oscillatory 
dynamics for inhibitor degradation rate py < pe = (1—4)/(1 +04) 
(hatched), asymptotic stability for py > p- (white) and Turing-type in- 
stability for Dy > 8 x py (black), where f is a function of 04 (Eq. 10.21). 
A maxima in the spatial correlation function at nonzero wavenumber re- 
veals parameter regimes where noise-induced pattern formation occurs in 
both activator and inhibitor (dark grey) and activator alone (light grey). 
The points marked a,b,c and d correspond to parameter choices for de- 
tailed stochastic simulation (Fig. 10.6), and computation of the spectrum 
of the spatial correlation function (Fig. 10.7). The grey regions are not 
true ‘phases’ from a dynamical systems point of view — although the 
behaviour of the stochastic model in those regimes does exhibit spatial 
structure not observed in the deterministic model. 
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Figure 10.6: Stochastic simulation of the reaction-diffusion master equa- 
tion using the Gillespie direct method (Section 4.2.1 on page 90) for the 
parameter choices indicated in Fig. 10.5. The simulation uses periodic 
boundary conditions in space over the domain L = 90 with unit sub- 
volumes (for clarity, only the middle third of the spatial domain is illus- 
trated). The simulation is initialized with approximately 200 molecules 
of each species in each subvolume. The density plots correspond to the 
level of activator A. A. (p47, Dy) = (0.5, 2): The deterministic system is 
temporally unstable. The stochastic simulation exhibits temporal oscilla- 
tions. B. (94, Dx) = (0.9, 10): The deterministic system exhibits a Tur- 
ing instability, evident in the stochastic simulation as spatial-patterning. 
C. (pH, Dz) = (0.9,7): The deterministic model is asymptotically stable 
in this parameter regime. Nevertheless, some spatial patterning is ob- 
servable and is more clearly evident in the spatial spectrum, Fig. 10.7A. 
D. (p7,DH) = (0.9,5): The deterministic system is stable, although 
the stochastic simulation data exhibits some evidence of pattern-forming 
instability. Surprisingly, in this parameter regime patterning in the acti- 
vator occurs in the absence of conjugate patterning in the inhibitor (see 
Fig. 10.7B). 
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accompanied by a patterning of the inhibitor, a feature that is more clearly 
seen in the spectra of the spatial correlation functions. Note, however, 
that the temporal stability of these spatial patterns requires analysis of 
the full spatio-temporal correlation. 

At steady-state, the Wiener-Khinchin theorem (Section 2.3 on page 39) 
can be invoked to relate the Fourier transform of the simulation data to 
the covariance spectrum of the fluctuations 5;;(k), 


Su(k) = ‘l 7 ((n;(0)n;(ax)) )e~** da = Ky(k) + (n,(0)). (10.22) 


—oco 


As a consequence, the analytical value of S(k), rather than K(k), ob- 
tained from the linear noise approximation will be compared to the value 
computed in the numerical simulations. Spatial patterning is evident in 
the spectrum of the spatial correlation function as a peak at nonzero 
wavenumber. Fig. 10.7A shows the spectra of the activator and inhibitor 
corresponding to the simulation data shown in Fig. 10.6C. There is a 
narrow peak in both the activator and inhibitor spectra close to k = 0.6, 
corresponding to the spatial pattern of wavelength approximately 10. The 
activator spectrum in Fig. 10.7B likewise shows a peak close to k = 0.6, 
but more broad than in Fig. 10.7A. Notice, however, there is no dis- 
cernible peak in the inhibitor spectrum; i.e., there is no finite k > 0 
for which we find a local maximum in the spectrum characterized by 
dSy1/dk = 0. The asymmetry between the stability of the activator and 
the inhibitor arises from the positive feedback loop in activator synthesis 
— consequently, we would expect one-species patterning to be a generic 
consequence of autoactivator-inhibitor models. Although this particular 
example is a toy model it clearly demonstrates the qualitative and quanti- 
tative differences that can arise due to intrinsic fluctuations in spatially- 
varying systems; effects that can arise in other more complex models. 
Furthermore, for sufficiently large positive feedback, it may be possible 
for a model to exhibit noise-induced spatial structure without requiring 
an associated disparity in the transport coefficients. 
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Figure 10.7: In the region of noise-induced spatial patterning, the spa- 
tial correlation spectrum is computed from the stochastic simulation 
(filled squares and diamonds), and compared to the analytic estimate, 
Eq. 10.22 (solid line). A. (p4,Dy) = (0.9,7): There is a strong peak 
at nonzero wavenumber in both the activator and inhibitor spectra, indi- 
cating almost-periodic noise-induced spatial-patterning. B. (94, Dx) = 
(0.9,5): Farther from the Turing space, the system is more stable, and 
the peaks in the spectra are less pronounced. What it remarkable is that 
it is possible to have patterning in the activator without any evidence 
of patterning in the inhibitor — behaviour that is not possible in the de- 
terministic model. We note that the temporal stability of these spatial 
patterns is not captured by the spatial correlation function, and requires 
analysis of the full spatiotemporal correlation. 


Excercises 


1. Show that the sum of two independent, Poisson variables is again a 
Poisson distributed variable. 


2. Show that for N = 0,1,2,..., the factorial moments (mf abe 
mf =(N(N—-1)...(N—n+4+1)), (n>1), 


are generated by the probability generating function, F(z) = (2%), 
via 


m 


F(l—2)= ‘: ce) mi. (10.23) 
m=0 
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3. The factorial cumulants «J, are defined by, 


Co 


log F(1 — x) yn 


m=1 


Express the first three in terms of the moments. Show that the 
Poisson distribution, 


is characterized by the vanishing of all cumulants beyond Kf . 


4. The multivariate factorial moments, denoted by curly brackets, are 
generalized from Eq. 10.23, 


(Ta ay) = CMa ppg, 


j {m} 


In a similar way, the multivariate factorial cumulants, denoted by 
square brackets, are, 


(leet 29") = 5 ng 


{m} 


Show that [NiNj] = (NiNj) — (Ni)(Nj) — 6:5 (Ni). 
5. Consider a stochastic model for simple exponential growth, 
N3N+1 


(a) Derive an expression for the average and variance. 


(b) Suppose N diffuses at a rate D in one spatial dimension. Com- 
pute the average and variance in a domain (—L/2, L/2) with 
the uniform initial condition N(x, 0) = no/L. 

(c) Take the limit Z — oo and show that the results in part 5b 
coincide with part 5a. 


CHAPTER L1 


se TOPICS 


This final chapter contains supplemental topics that don’t really fit in the 
rest of the notes. They are either suggested extensions of topics covered 
in previous chapters, or very short discussions of topics that may be of 
interest to some readers, along with a list of relevant references. 


11.1 Random Walks and Electric Networks 


e P. G. Doyle and J. L. Snell, Random Walks and Electrical Networks (Mathe- 
matical Association of America, 1984). 


e S. Redner, A Guide to First-Passage Processes (Cambridge University Press, 
2001). 


There is a direct analogy between the escape probability for a random 
walk along a lattice (Figue 11.1a) and the potential along the nodes of an 
array of resistors (Figue 11.1b). The equivalence is made precise by ob- 
serving the correspondence of the steady-state master equation governing 
the probability and the Kirchoff laws governing the potential. 

Beyond the analogy, recasting a high-dimensional random walk in 
terms of the electrical properties of a high-dimensional array of resis- 
tors allows many difficult theorems to be proved with ease. Consider, for 
example, the question of recurrence in high-dimensional random walks. If 
a random walker is guaranteed to return to the origin at some point in the 
wandering, then the walk is called recurrent. If there is some probability 
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Figure 11.1: Equivalence of a random walk and the potential along 
a resistor array. A) Escape probability from a two-dimensional lattice 
for a random walker. The nodes marked EF are escape points, while the 
nodes marked P are police. B) An array of 1 Q resistors, with boundary 
points held at 1 volt, or grounded. C) The escape probability, or equiva- 
lently the potential, at the interior nodes. Redrawn from Figures 2.1 and 
2.2 of Doyle and Snell Random Walks and Electrical Networks (1984). 


of never returning, then the walk is transient. Pélya proved the following 
theorem: 


POLYA’S THEOREM: A simple random walk on a d-dimensional 
lattice is recurrent for d = 1,2 and transient for d > 2. 


In the language of resistor arrays, Pélya’s theorem can be restated as: the 
random walk in d-dimensions is recurrent if and only if the resistance to 
infinity is infinite. Estimating the resistance at infinity is a far simpler 
approach than Pélya’s original proof of the problem. 


11.2 Fluctuations Along a Limit Cycle 


e K. Tomita, T. Ohta and H. Tomita (1974) “Irreversible circulation and orbital 
revolution,” Progress of Theoretical Physics 52: 1744. 


e F. Ali and M. Menzinger (1999) “On the local stability of limit cycles,” Chaos 
9: 348. 


e Scott M, Ingalls B, Kaern M (2006) Estimations of intrinsic and extrinsic noise 
in models of nonlinear genetic networks. Chaos 16: 026107. 


In Section 5.1.2, we used the linear noise approximation to estimate 
the statistics of the steady-state fluctuations in the Brusselator model. 
What makes the Brusselator model interesting is that over a range of 
parameter values, the system exhibits a stable limit cycle. With a change 
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Macroscopic 
limit cycle 


Figure 11.2: Rotating change of basis along the limit cycle. To sep- 
arate the fluctuations tangent to the limit cycle from those perpendicular, 
we make a rotating change of basis from n,-ng space to r-s space. 


of basis, the linear noise approximation can again be sued to characterize 
the fluctuations around the limit cycle. That change of basis is the subject 
of the present section. 
Limit cycle regime ) >1+a 

From the Fokker-Planck equation for II (a1,@2,t) far away from the 
critical line b = 1+ a, Eq. 5.23, we obtain the evolution equations for the 
variance of the fluctuations, 


dCi; 
ae = d. DimCmj + d, DjnCin + Di, 


where we have written Ci; = Cj; = (aia;). In the parameter regime 
where the macroscopic system follows a limit cycle, the coefficients T 
and D will be periodic functions of time, and the equations governing the 
variance will not admit a stable steady-state. Physically, there is no mech- 
anism to control fluctuations tangent to the limit cycle, and the variance 
grows unbounded in that direction. Trajectories perturbed away from 
the limit cycle are drawn back, however, and so we expect the variance 
of fluctuations perpendicular to the limit cycle to reach a steady-state. 
We introduce a change of basis, with s tangent to the limit cycle, and 
r perpendicular (Figure 11.2). The transformation matrix to the new 
coordinate system is given by the rotation: 


co; o (t) in ¢ (t) 
UGQ)=| vmd(Q) cosdlt) | 


where ¢(t) is related to the macroscopic rates of x; and x2, (denoted by 
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Figure 11.3: Fluctuations about the limit cycle. a) Although the 
variance of the fluctuations tangent to the limit cycle grows without 
bound, the perpendicular fluctuations are confined to a gully of width 
C,,, shown as a dashed line. b) The blue curve is the result of a stochas- 
tic simulation of the system. The trajectory is confined to the a region 
very close to the macroscopic limit cycle. 


fi and fo, respectively), 


aah = fi (t) 
mo) = Taye RO 
cos @ (t) = fo (t) 


VEOF+RO 
With the change of basis, the evolution equation for the variance C’ in 
the r-s coordinate system is given by, 
dC’ 
dt 


=(IY+R)C’+[(I’ + R)C]’ +D’, 
where I” and D’ are the transformed matrices, 


I’ = U(t)- T(t). U? (2), 
D/ = U (t)- D(t)- UT (2), 


and R is the rate of rotation of the coordinate frame itself, 


dp| 0 1 
R= FS “ah 
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In this new coordinate system, the variance C’,,, decouples from the 
tangential fluctuations, and is characterized by the single evolution equa- 
tion, 


dC rr 
dt 


= 2p Crr + Dis 
that converges to a stable limit cycle. After the transients have passed, 
the trajectory in the original n1-n2 coordinate system can be anywhere 
along the limit cycle, but tangential fluctuations are confined to a narrow 
gully with perpendicular variance C;,, shown in Figure 11.3a as a dashed 
curve confining the macroscopic limit cycle. Here, a = 5, b = 10, and 
Q = 1000. The blue curve (Figure 11.3b) is a stochastic simulation of the 
same system (compare with Gillespie (1977), Figure 19). Again, the linear 
noise approximation captures the statistics of the randomly generated 
trajectories very well. 


11.3. Stochastic Resonance 


e B. McNamara and K. Wiesenfeld (1989) “Theory of stochastic resonance,” Phys- 
ical Review A 39: 4854. 


Recall from Section 6.7 on page 143 that the escape probability from 
a potential well depends exponentially upon the well depth. As a con- 
sequence, small changes in the well-depth can have exponentially larger 
effects on the hopping rate between wells. 

For example, suppose in addition to a symmetric, two-well potential 
there is a small, periodic perturbation to the potential (Figure 11.4), 


U(x) + U(x) +ecoswyt (€ <1). 


The particle in the well is subject to Brownian motion of variance D, 
and for a fixed fixed signal magnitude ¢ and frequency w,, the system will 
exhibit a maximum in the signal-to-noise ratio as the white noise variance 
D is varied (Figure 11.5). Here, the signal is the component of the power 
spectrum centered on ws. 
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Figure 11.4: Stochastic resonance. A) Consider the hopping of a par- 
ticle between the wells of a symmetric double-well potential. The rate of 
hopping from —c to c is the same as the rate of hopping from c to —c. B) 
Impose upon this potential a second, very small perturbation that results 
in the tilting of the double-well with a prescribed frequency wy, — called 
the signal. Over a range of noise strengths, the hopping probability will 
become slaved to the external perturbation. As the magnitude of the 
noise is increased, the signal-to-noise ratio will attain a maximum. This 
ability of the system to turn noise into an amplification of a weak signal 
(rather than a corruption) is called stochastic resonance. Redrawn from 
Figure 2 of McNamara and Wiesenfeld (1989). 


Signal-to- 
noise ratio 


200 400 600 
D (Hz) 


Variance of white noise forcing 


Figure 11.5: Stochastic resonance. As the variance in the white noise 
forcing (D) is varied, the component of the power spectrum lying at fre- 
quency ws is enhanced well above the background white noise. This effect 
is manifest as a maximum in the signal-to-noise ratio. Redrawn from 
Figure 9b of McNamara and Wiesenfeld (1989). 
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Figure 11.6: Mathematical foundations of the Kalman filter. The 
Kalman filter combines fundamental ideas spread across several areas to 
obtain an estimate of the state in the presence of noise in the system 
and noise in the observations. Redrawn from Figure 1.1 of Grewal and 
Andrews Kalman Filtering (2001). 


11.4 Kalman Filter 


e R. E. Kalman (1960) “A new approach to linear filtering and prediction prob- 
lems,” Transaction of the ASMEJournal of Basic Engineering 82: 35. 


e M.S. Grewal and A. P. Andrews, Kalman Filtering: Theory and practice using 
MATLAB, 2”¢ Ed. (Wiley Interscience, 2001) 


A problem that is of central concern in control theory is how to es- 
timate the trajectory of a system, perturbed by noise, given a history of 
observations corrupted by noise. For example, given the following model 
for the trajectory x with known input wu, and the set of observations z, 


< = Fr+Gn(t)+Cu (11.1) 
z=Hx+€&(t)+ Du, (11.2) 


where 7(t) and €(t) are zero-mean (uncorrelated) white noise, 


(n(ti)n(t2)) = Q(t1) (ti — te), (11.3) 
(€(t1)€(t2)) = R(ti)d(ti — te), (11.4) 
(n(t1)€(t2)) = 0, (11.5) 


how can the state of the system x be estimated? Obviously in the absence 
of noise, @ = H~1+z provides an exact estimate of the state. The question 
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is: with the state dynamics and the observation data corrupted by noise, 
how can we optimally extract the estimate <? 

First, we must define in what sense an estimate is optimal. We seek 
an estimate @(t) which is a linear function of the observations z(t) which 
minimizes 


where M is a given symmetric positive-definite matrix. We say that “(t) is 
optimal in the mean-squared sense, or that &(t) is a least-squares estimate 
of x(t). To be precise, the continuous-time filter is called the Kalman- 
Bucy filter, while the discrete-time filter is called the Kalman filter. The 
discrete-time implementation is more straightforward to explain. 


Discrete Kalman filter 


We have a state x, that evolves through (known) deterministic dynamics 
® and some white noise forcing wz, 


L411 = O- ee + we, 


where wy is a vector of random variables drawn from a normal distribution 
N(0,Qx). Our knowledge of the true state x, comes from observations 
zp that are also corrupted by white noise vz 


Zp = H-xp,+ vp, 


where vz is a vector of random variables drawn from a normal distribution 
N(0, Ry), that are all independent of wy. 

Given some initial estimate of the state 4% (here, the ~ indicates 
an estimate of the unknown state x;,) and the variance of our estimate 
Po = (&o - #3), the Kalman filter provides an algorithm for updating our 
estimate of the state % 9 and the variance Po using an auxiliary quantity 
called the Kalman gain matrix. The Kalman gain matrix kK; tells us how 
much to trust the observation z, in refining our estimate <, beyond the 
deterministic dynamics ® - This is done in such a way that the estimate 
&, is optimal in the mean-squared sense described above. 

Specifically, we use ® to form an initial estimate of the state %;,(—) 
and of the variance P;,(—), 


&x(—) = Og-1- Ee_-1(4), 
Py(—) = ®p_1 > Pa_a(+)- F_1 + Qn. 
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Then we use the Kalman gain matrix Kx, 
-1 
Ky = Pe(—)- HE - [He Pe(—)- HP + Ba], 
and the observation z;, to refine our original estimates, 


&x(+) = &%(-) + Kn: [ee — He: &x(-)], 
P,(+) = [f — Ky - He] Pe(-), 


and the process is repeated at each time-step. 


11.5 Novikov-Furutsu-Donsker Relation 


e E. A. Novikov (1965) “Functionals and the random-force method in turbulence 
theory,” Soviet Physics - JETP 20: 1290-1294. 


e K. Furutsu (1963) “On the statistical theory of electromagnetic waves in a fluc- 
tuating medium,” Journal of the Research of the National Bureau of Standards 
D67: 303. 


e M. D. Donsker M.D (1964) “On function space integrals” in Analysis in Func- 
tion Space, eds. W.T. Martin & I. Segal, MIT Press; pp 17-30. 


For a Gaussian-distributed multiplicative noise source A(t), 


OY _ Agy + Ar(ty, (11.6) 
dt 

the Novikov theorem allows the correlation of the process y(t) depend- 
ing implicitly upon A,(t) to be calculated in terms of the autocorrelation 
of A,(t) and the functional derivative of y(t) with respect to the fluc- 
tuations. To leading order, we simply recover Bourret’s approximation, 
although the Novikov theorem does open the way to a variational estimate 
of higher-order terms as shown in Section 11.5.1. 

If we average the evolution equation (11.6) directly, we have, 


“ (y (t)) = Ao (y (¢)) + (Ai ty @), (11.7) 


for the scalar function y(t). The correlation of the coefficient A,(t) and 
the process y(t) can be calculated using Novikov’s theorem, 


(4 u(0) = fare) (SEE Vat, as) 


0 
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where (55 ) is the functional derivative of y(t) with respect to the 


Gaussian random function A,(t’). The Green’s function Go (¢, t’) for the 
noiseless operator [4 — Ao] allows y(t) to be re-written as an integral 
equation, 


t 
y(t) = Go(t,0)y (0) +a f Go (tr) Ai (run) dr. 
0 
Taking the functional derivative with respect to A,(t’), 


ae =0+a [ Go(t,7) ser-tum ta (7) 5A, (t’) dt 
0 


“ Gey Oya i) Go (tyr) Ai (7) si ar, (11.9) 
0 


which is an integral equation for the functional derivative that can be 
solved iteratively if a is small. Retaining only the leading term, and 
averaging 


(say) © aoe WO), 


[In the Section 11.5.1, we consider a variational approximation of the 
higher-order terms]. With substitution into (11.8), (11.7) becomes, 


5 (y(t)) = Ao (u() a2 f (Ar (1) Ar ()) Go (tt) (WD) 


0 


which is identical to Bourret’s approximation of the evolution equation for 
the first-moment (8.20) written in the original notation with Go (t, t’) = 


e40(t-"’) and y(t) = eAolt-* y(t’), 


OK OK Ok Ok 


This proof of Eq. 11.8 follows the original by Novikov, with annota- 
tions. In full, the Novikov-Furutsu-Donsker relation reads: 
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The correlation of a Gaussian random variable y(t) with a 
function depending upon that variable x [7 (¢) ; t] is: 


ow2@) =f y@rH) ( oot ats. 


Proof : 
The functional x [y] is expanded as a functional Taylor series about 
the point y(t) = 0: 


asf (Sneed (IL 


cecal O°x 
14 seal (=) 


Here, for brevity, 


( O'x ) ( 6” x (t) ) 
6%1--5In J a0 \ OY (t1) 5 (En) Sea 


The functional derivative is evaluated at y = 0 (zero noise) and as such 
constitutes a non-random function so that 


( O° x ) ( one ) 
641---0Yn y=0 0791---0Yn y=0 


Multiplying (11.10) by 7 (t), and taking the ensemble average, 


n= Dal (acme cae (y(t) 7 (t1) 7 (tn) dtp dt. 


(11.11) 
For a Gaussian random variable, the mean value of the product of an odd 
number of terms vanishes, while for an even number of terms, the mean 
value is equal to the sum of products of all pair-wise combinations: 


7 (t1) 7 (tz) dtidtz + ... 


7 (ty) 7 (tn) dty...dty. (11.10) 


(y(t) 7 (ta) 7 n)) = D0 ) 7 (te) (9 (ta) 9 (tx1) 7 (tact) 7 (tn) 5 


a=1 
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(where (¥ (t1) ..-¥ (ta—1) ¥ (to-+1) ---¥ (tn)) can obviously be further divided 
into pair-wise combinations). The functional derivatives in the integrals 
of (11.10) are all symmetric with respect to the labelling of the arguments 
ty, ..-, tn, so that there are n equivalent ways to choose a@ above, 


(7 (4) 7 (ta) ++ (En)) = (9 (8) 7 (1) (7 (ta) 7 (én) - 


Substituting into the cross-correlation (11.11), 


[owen ps man t (smiae) _g 12 ota) 


oe (11.13) 


On the other hand, taking the functional derivative of (11.10) directly, 
again taking advantage of the symmetry, 


ba |y (t)] = 1 "x oy (t1) 
Oy (t!) * Day (n—1)! if Cea oy (ey V (2) HY En) Biniden 


n=1 


oS Oe 
6 (t, — t’) y (te) «ny (tn) dty...dtn 
moan (amie) _, Be) ena 


n=1 


= OP x 
= tg) ...¥ (tn) dtg...dtn. 
d, = 1)! / (em)! 2) -¥ (tn) dta 


n= 


Call t' = t,, and take the ensemble average, 


rea) = > aT / aoe (7 (t2) 7 tn) at 


1 
(11.13) 
Substituting the left-side of (11.13) into (11.12) yields the Novikov-Furutsu- 
Donsker relation. For a causal process, x (t) will only depend upon values 
of y (ti) that precede it, so, 


(W2(0) = fo W(t) (28) at. (11.14) 
0 


11.5.1 Variational Approximation 


e F. J. Dyson (1949) “The radiation theories of Tomonago, Schwinger, and Feyn- 
an,” Physical Review 75: 486-502. 
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e L. Rosenberg and S. Tolchin (1973) “Variational methods for many body prob- 
lems,” Physical Review A 8: 1702-1709. 


Beginning with equation (11.9), 


dy (t) [cot dy (T) 
= t,t’) G dr, 
5A, (t’) aGp (t,t) )ta 0 ( (7) 5a a (t ’) 
it is convenient to assume a form for the functional derivative 
dy (t) nee 
=aG (t,t t 11.15 
Bey = oC tule): (11.15) 


for some noiseless function G (t,¢’). With substitution, 
G (t,t’) = Go (t, t’) + 0 [ Go (t,7) Ai (rT) G(r, t’) dr. (11.16) 


This is sometimes called Dyson’s equation, and a variational principle 
derived from this equation has a long history (see Rosenberg and Tolchin). 
A variational estimate of G (t,t’) correct to first order is: 


Gy ¢t)= Go (t, t’) )+a faite T) Ay (rT) Go (7, t') dr 
+0 [ Go (t,7) Ai (7) Go (7, t') ara f Gi (t,7) Ai (7) Go (7, t') dr 
+0? f fc (t,71) Ai (m1) Go (m1, 72) Ai (172) Gg (To, t') dridra. (11.17) 


Here, Gj (t,t’) and G2 (t,t’) are two trial functions. The stationarity of 
this estimate with respect to independent variations in the trial functions 
about the exact solution can easily be demonstrated using (11.16), 


dG, (t,t) 
6G4 (41,441) e,-G=¢ 
= —ad (t — t1) Ai (t4) x 
G (t),t’) — Go (t,t) - a [Ge (t1,7) Ai (7) G (7, t, ) dr 


= 0, 


and similarly for variations in G2 (t,t’). The variational approximation 
can be used as the starting point for a Raleigh-Ritz scheme to determine 
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the optimal trial functions via some expansion of G (t,t’) and G» (t, t’) 
in a suitable orthonormal basis. Here, however, we shall use G, (t,¢t’) 
directly. 

By assumption, G (t,t’) is a noiseless function, while G, (¢,t’) is not, 
so we take the ensemble average of (11.17)(assuming the trial functions 
are deterministic), 


G(t,t’) © (G (t,t) = 
Go (t,t!) + ofa, (t, 7) (Ay (7)) Go (7, t') dr 


+0 | Go (t,7) (Ay (7)) Go (7, t') dr — ofa, (t,7) (Ay (7)) Go (7, t') dr 


+a [flo (t,71) Go (71,72) Go (2, t’) (Ay (1) Aj (T2)) dt dt. 


(Gy (t,t’)) still obeys the variational principle, as one can show by mul- 
tiplying (11.16) by A; (7), and taking the ensemble average (provided 
the trial functions are not random). Furthermore, the averaged solution 
should provide a superior estimate of G (¢,¢’) than than the one obtained 
by successive iteration of the original integral equation (11.9). 

With the convenient choice G, (t,t’) = Go (t,t’) = Go (t,t’) and for 
zero-mean noise, the averaged solution is considerably simplified: 


G (t,t') = Go (t,t/) + 
of [co (t, 71) Go (71,72) Go (To, t’) (Ay (71) At (72)) dt,dT2 


Substituting back into the variational derivative (11.15), 


dy (t) 
5A, (t/) 


ae / i Go (t, 71) Go (71,72) Go (72, #”) (At (11) At (72) andn| y(t’). 
Finally, the Novikov relation reads: 


(Ai (4) y ()) =a f G (t,t) (Aa (t) Ai (#)) (y (#)) dt! + 


= aGo (t,t) y (!) + 


. / 
+a / [| [@ (t, 71) Go (1, T2) Go (72, t') (Ai (71) At (72)) andn| x 
x (Ay (t) Ai (#’)) (y (#’)) dt’ 


The first term is Bourret’s approximation, while the second term is the 
higher-order correction obtained from the variational principle. 


ern A 
—_———} 


REVIEW OF CLASSICAL PROBABILITY 


The mathematical theory of probability deals with phenomena en masse, 
i.e. observations (experiments, trials) which can be repeated many times 
under similar conditions. Its principal concern is with the numerical char- 
acteristics of the phenomenon under study, 7.e. quantities which take on 
various numerical values depending upon the result of the observation. 
Such quantities are called random variables; e.g. 


e the number of points which appear when a die is tossed. 


e the number of calls which arrive at a telephone station during a 
given time interval. 


e the lifetime of an electric light bulb. 
e the error made in measuring a physical quantity. .. 


A random variable € is regarded as specified if one knows the cumulative 
distribution function, 


F(a) = P{E< x}, we (—~, o) (A.1) 


where P is the probability of occurrence of the relation in {-}. The cu- 
mulative distribution function F(x) can be shown to have the following 
properties, 


1. F(—co) =0; F(co) = 1; 
257 
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2.%,<%2> F (21) < F (x2) 

3. F(xt) = F(x) (continuity from the right) 
It is convenient to introduce the probability density function, 
_ aF 
dx’ 
of the random variable €. Since F(x) might not have a derivative for every 
x, one can distinguish several types of random variables — in particular, 


f(z) (A.2) 


1. Continuous random variables. We assume that the number of 
points where f(a) doesn’t exist is a countable set (i.e. points of 
discontinuity are comparatively “few” ). Then it can be shown that, 


e 2=> f(x) >0. 


elend ALS f° piaidee Tl. 


e A.2=> F(z) aif f (a) da’, i.e. F (a2)—F (a1) = {f(v) dx. 


From this, and from P{a, < € < x2} = F (a2) — F(21), it follows 
that, 


P{r<€<aa}= f F(v)de 


L1 
In particular, if Az is sufficiently small, then, 

P{a<€&<a+Azr} & f(x)Ag, 
and so one also has the definition, 


4 Pla<é<at+Az} 
F(z) = Pee Ar ; 


Note this definition leads to a frequency interpretation for the proba- 
bility density function. To determine f(x) for a given x, we perform 
the experiment n times, and count the number of trials An(x) such 
that « <€<a+Az, so 
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F(x) F(x.) e——_o 
Sse 
F(x) 
eo—O : 


Figure A.1: Typical distribution function for a discrete process. 


2. Discrete (and mixed) random variables. These have a cu- 
mulative distribution function F(x) that resembles a staircase (fig- 
ure A.1). We have, 

Pn = P{E = tn} = Fan) — F(tq) (A.3) 
S" Pn = F (00) — F(-00) =1 (A.4) 


In general, 


F(x) = S- P{€=2,}, mnsuch that rz, <x. (A.5) 


If a random variable is not of continuous type, then one doesn’t 
usually associate with it a probability density function, since F(x) 
is not differentiable in the ordinary sense. Nevertheless, we extend 
the concept of function to include distributions — in particular, the 
Dirac 6-function, specified by the integral property, 


[$5 (@- 20) de = 6(20), 


where ¢ is continuous at x9, though otherwise arbitrary. 


Then one can show that if F(a) is discontinuous at x9, we have, 


dF 


ae =k-d(a—20); k=F(ai)—F(xq) (the step height). 
x 


xr=2ZO 


In particular, if 
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Figure A.2: The Dirac Delta function 6(z) is a distribution that 
can be thought of as the derivative of the Heaviside step function 


O(a). 


then, 


do 
= = 2) 


(see figure A.2). Using this, Eqs. A.4 and A.5 give, 


f(x) =) pnd(x — en). 


An important, and subtle, result is the following. 
Existence theorem: Given a function G(x) such that, 


e G(-o) =0;  G(oo) =1; 
@ 4 < X92 > G(x) < G(x2) 
e G(xt) = G(z) (continuity from the right) 


one can find an experiment, and an associated random variable, such that 
its cumulative distribution function F(x) = G(x). 
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Similarly, one can determine a random variable having as a probability 
density a given function g(a), provided g(x) > 0 and f g(x)da = 1. 
This is done by determining first, ’ 


Ga) = f g(e'yae' 


—oo 


and then invoking the existence theorem. 


A.1 Statistics of a Random Variable 


1. Expectation value: 


J xf (x)dzx, if € is continuous 


E{g}=4 -« 
Yenpn, — if € is discrete 


2. Variance: 


Co 


J 


o? =E{(€-E{é})"} = = —E{é})" f (x) dx 


( 
from which we have the important result: 
o”? = EB {é?} — (B{é})’. 


Note that o? is also called the dispersion, and o, the standard devi- 
ation. If the variance is vanishingly small, then the random variable 
€ becomes a deterministic, or sure, variable: 


o? =0 <> € is the sure variable (€). (A.6) 
3. Moments: 
f a® f (a) da 
Me = E {e*} = —co 
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Note that mo = 1 and m,; = E{&}. Actually, one frequently uses 
the central moments, 


: S (@-E{é})* f (@) de 
pe = B{(E- B{e})*} = ¢ ce : 
» (tn 7 E {€}) Pn 
where, again, we note that jg = 1, 4, = 0 and p2 = o?. When we 
discuss multivariate distributions, it will be convenient to introduce 
the following notation — We denote the second central moment by 
the double angled brackets, 


(XY)) = (XY) — (X){¥). (A.7) 


Correlation Coefficient: It is sometimes useful to express the de- 
gree of correlation among variables by some dimensionless quantity. 
For that purpose, we introduce the correlation coefficient, 

(XY)) 

p=——_. (A.8) 

OxOy 
Notice that the correlation coefficient is bounded, —1 < p < 1. 
Furthermore, if p < 1 we say the two processes X and Y are anti- 
correlated, while if p > 0, we say X and Y are correlated. For p = 0, 
the two process are obviously uncorrelated. 


Characteristic function: The characteristic function is a gener- 
ating function for the moments of a distribution. There is a unique 
one-to-one correspondence between the characteristic function ¢(w) 
and a given probability density f(a). The characteristic function is 
defined as 


o(w) =E feive} - / e'”* f (x) dx. 


Notice this is precisely the Fourier transform of the probability 
density f(x). A useful result is the moment theorem, 


=1"Mn. 
w=0 


The characteristic function is also useful as an intermediate step in 
computing the moments of a nonlinear transformation of a given 
distribution. 


Classical Probability Theory 263 


6. Cumulants: The characteristic function also generates the cumu- 
lants «, of the random variable, 


Tr 


Ing (w SS Kay A.9 
Zor MOC) = in (A.9) 
The cumulants are combinations of the lower moments, e.g. 


K1 =™1, 
Kg = M2 —m? = 07, 
K3 = M3 — 3m2mM}, +2m3,... 


For a Gaussian distribution, one can show that all cumulants be- 
yond the second are zero. The utility of cumulants comes from the 
identity, 


€ ae Se hi 

a _ = 

CS 2 ai Min = exP Ss, mn (>? (A.10) 
n=0 n=1 

where a is a constant and € is a random variable. In that way, cumu- 

lants appear in approximations of the average of an exponentiated 

noise source. 


For a sequence of random variables {€} = {&, 9,...}, the charac- 
teristic function is given analogously by ¢(w) = (exp [iw €]). In 
that case, the n*’-order cumulant, denoted by double-angled brack- 
ets, is, 

CAE 


= Busou Ou, 


a” ((Ee85 «+ &e)) 
w=0 
Multivariate cumulants obey a remarkable property: the n*”-order 
cumulant ((€&;...&)) is zero if the elements €;,€;,... are divided 
into two or more groups that are statistically independent. As a col- 
lorary, the cumulant is zero if one of the variables in it is statistically 
independent of the others. 


The proof can be found in R. Kubo’s “Generalized cumulant expan- 
sion method,” (1962) Journal of the Physical Society of Japan 17: 
1100-1120, and goes as follows. If the variables {€} = {&1, €,...} 
are divided into two groups {€} = {é’} U {é”} that are statistically 
independent, then the characteristic function factors 


o(w) = (exp bs wii| ) = (exp bs wit, + eH AD 
= (exp bs wi ) (exp [S wie] ) =o (w’)- db" (w"). 
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Therefore In ¢ (w) = Ind’ (w’) + Ind” (w”). If the n‘"-order cumu- 
lant contains members of both sets {§’} and {€’’}, then the partial 
derivative vanishes, 


aa a 
sriisheth « pha ek A AO) 5 
U ((& j £x)) Aus} Ou! ust. no (w) Lg 
se OREO 5A erie gO EE at dines 
“ah Gol Bak OO ek ut Ba 
= 0, 


ensuring that the cumulant is zero. 
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Figure A.3: Common probability density functions. A) Uniform. 
B) Exponential. C) Normal (Gaussian). D) Cauchy (Lorentz). Redrawn 
from Gillespie (1992) Markov Processes, Academic Press, Inc. 


A.2 Examples of Cumulative distributions 
and Probability densities 


1. Uniform The uniform random variable has probability density func- 
tion, 


fe) = { (b- ay asesd | (A.11) 


0 otherwise 


(see Figure A.3A). In the domain x € [a,b], the cumulative distri- 
bution function is correspondingly given by, 


a—-2Zz 


F(x) = fe —a) ‘de’ = rae (A.12) 


(see Figure A.4A), with F(x) = 0 for x < a,and F(x) =1 fora > b. 
A random variable drawn from this probability distribution is said 
to be “uniformly distributed on [a, b],” written as X = U(a, b). The 
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Figure A.4: Common cumulative distribution functions. A) Uni- 
form. B) Exponential. C) Normal (Gaussian). D) Cauchy (Lorentz). 
The corresponding density functions are shown as dashed lines. 
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mean and standard deviation are, 


a+b 2 b—a 
XS t MSOs A.13 
Co aaa ((X — (X))*) 73 (A.13) 
As a consequence, 
lim U (a, b) = the sure variable a. (A.14) 


ba 


. Exponential The exponential random variable has probability den- 
sity function, 


f(x) =aexp(-ar) «> 0, (A.15) 


(see Figure A.3B). For 2 > 0, the cumulative distribution function 
is correspondingly given by, 


F(a) = is aexp(—az’)daz’ = 1 — exp(—az), (A.16) 


(see Figure A.4B), with F(x) = 0 for x < 0. A random variable 
drawn from this distribution is said to be “exponentially distributed 
with decay constant a,” written as X = E(a). The mean and stan- 
dard deviation are, 


() = (K-07) = 2 


(A.17) 
As a consequence, 


lim E(a) = the sure number 0. (A.18) 
aco 


. Normal/Gaussian The normal random variable has probability 
density function, 


f(x) = Se (A.19) 


1 
V 2102 oe | 207 


(see Figure A.3C). The cumulative distribution function is corre- 
spondingly given by, 


Fa) =f ox | PP | ae = 5 (1 et [7=5]). 
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where 
ate ifs —? dt (A.21) 
ri(z) = — e 5 ; 
Vt Jo 


is the error function (see Figure A.4C). A random variable drawn 
from this distribution is said to be “normally (or Gaussian) dis- 
tributed with mean yw and variance o”,” written as X = N(y,07). 
The mean and standard deviation are, 


(X)= py (KX (X))?) <0? (A.22) 
As a consequence, 
lim N (1,07) = the sure variable yp. (A.23) 
a0 


The multivariate Gaussian distribution is a straightforward gener- 
alization of the above — for the n-dimensional vector x € R”, the 
joint Gaussian distribution is, 


P(x) = (2n)~# (det ©)-# exp |—5 (x — (x) CF - (x — (x))], 


where (x) = p is the vector of averages and C is called the cross- 
correlation matriz with elements given by Ci; = (aixj) — (ai) (xj). 
Obviously, by definition C is a symmetric, positive-semidefinite ma- 
trix. 


. Cauchy/Lorentz The Cauchy random variable has probability 


density function, 
(o/m) 
A ae aT RICE (A.25) 


(see Figure A.3D). The cumulative distribution function is corre- 
spondingly given by, 


-oo (2! — y)* + o* 2 


"| , (A.26) 


(see Figure A.4D). A random variable drawn from this distribu- 
tion is said to be “Cauchy distributed about with half-width o,” 
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written as X = C(y,c). Although the distribution satisfies the nor- 
malization condition, the tails of the distribution vanish so slowly 
that one can show that no higher moments of the distribution are 
defined! Nevertheless, since the probability density function is one 
of many representations of the Dirac-delta function, it follows that, 


lim C (41,0) = the sure variable pu. (A.27) 
o> 


A.3 Central limit theorem 


I know of scarcely anything so apt to impress the imagination 
as the wonderful form of cosmic order expressed by the “Law of 
Frequency of Error”. The law would have been personified by 
the Greeks and deified, if they had known of it. It reigns with 
serenity and in complete self-effacement, amidst the wildest 
confusion. The huger the mob, and the greater the apparent 
anarchy, the more perfect is its sway. It is the supreme law of 
Unreason. Whenever a large sample of chaotic elements are 
taken in hand and marshaled in the order of their magnitude, 
an unsuspected and most beautiful form of regularity proves 
to have been latent all along. 


-Sir Francis Dalton (1889). 
In it’s most restrictive form, the Central Limit Theorem states, 


Let X1, Xo, X3,...Xy be a sequence of n independent and iden- 
tically distributed random variables having each finite values 
of expectation jz: and variance a? > 0. As the sample size n 
increases (n — oo), the distribution of the random variable 


_ done Xi — MY 
o/n ; 


approaches the standard normal distribution N(0, 1). 


Y, (A.28) 


This result was significantly generalized by Lyapunov (1901), who 
showed that it applies to independent random variables without identical 
distribution: 


Let {X;} be a sequence of independent random variables de- 
fined on the same probability space. Assume that X; has finite 
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mean j; and finite standard deviation o;. We define 


n 


2 2 
=> O;. 


i=1 


Assume that the third central moments 


Under these conditions, writing the sum of random variables 
S, = X,+---+X,, and the variable defined by 
Zn es Sn — ue 
Sn 
then the distribution of Z,, converges to the standard normal 
distribution N(0, 1). 


A.4 Generating Random Numbers 


Notice the cumulative distribution function F(x) by definition always 
lies between 0 and 1. It is possible, then, to obtain a sample x from 
any given probability distribution by generating a unit uniform random 
variable r € U(0,1), and solving for F(x) =r, 


g=F-l(r). (A.29) 


By spraying the vertical axis with a uniformly distributed random sam- 
pling, the cumulative distribution function reflects the sampling onto the 
horizontal axis thereby transforming r into the random variable x with 
the desired statistics. Since F’(4%) = f(x), and f(x) > 0, the function 
F(z) is strictly increasing wherever f(a) 4 0 and the inversion F~! is 
uniquely defined over those domains where 


f(x) #0 
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For example, the exponentially distributed random variable E(a) has 
the cumulative distribution function, 


F(a) = 1—-—exp(—az). (A.30) 
If we set F(x) =r, where r is a unit uniform random variable, then 
x = (1/a)In(1/(1 — r)) = (1/a) In(1/r), (A.31) 


is a random variable, exponentially distributed with decay constant a. 
(Because r is a unit uniform random variable, so too is 1—r.) This relation 
will be of particular interest in Chapter 4 when we consider numerical 
simulation of random processes. 


A.4.1 Joint inversion generating method 
e@ Markov Processes, D. T. Gillespie (Academic Press, Inc, 1992), p. 53. 


The following procedure is used to generate joint random variables, 
and is particularly useful in the generating of Gaussian-distributed ran- 
dom variables (see Exercise 3 on page 194). 


Let X1, X2 and X3 be three random variables with joint den- 
sity function P. Define the functions F,, F2 and F3 by 


Fla) = f Pi(ci)aet, 


—oco 


x2 
Fo(a2; 1) = / PS) (xb |ar1)dx, 


—oCo 


x3 
F3(x3; 1,%2) = / PL?) (ala, w2)da’s, 


—oo 
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where the subordinate densities pi ) are defined as follows, 


P,(a) = | Plenee.ta)deades, 
J P(x1,x2,23)dx3 
PS? (9|01) = == 
f P(a1, £2, x3)dx2dx3 
P 
POO Ge a= = (x1, £2, 23) 
J P(x1,x2,23)dx3 


Then if 71, rg and rs are three independent unit uniform ran- 
dom numbers, the values x1, x2 and x3 obtained by succes- 
sively solving the three equations 


Fy(v1) =1"1, 
F5(v23 1) =e, 


F323} #1, 29) ='r3, 


are simultaneous sample values of X,, X2 and X3. 


A.5 Change of Variables 


e D. T. Gillespie (1983) “A theorem for physicists in the theory of random vari- 
ables,” American Journal of Physics 51: 520. 


Given a random variable X with density distribution P(x), and a new 
random variable Y defined by the transformation, 


y = 9(2), (A.32) 


(where g(-) is a given deterministic function), the probability density Q(y) 
of this new variable is given by, 


aw) = | ” PwiGr—ae))ae. (A.33) 


Here, as always, 6(-) is the Dirac-delta function. In general, for a vector of 
random variables {X;}?_, with density P(a), and transformed variables 
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{yj = 9j(x)} 7-1, the joint density for Y is, 
Qu) = | Pe@)[] su ~ (eae. (A.34) 
Bhs aA 


If m = n, then making the change of variables z; = g;(a) in the integral 
above, we arrive at the formula for a transformation of probability density 
functions, 


Q(y) = P(x 


Paine (A.35) 


Oty Yay +3 sn) |” 


where |Ox/Oy| is the determinant of the Jacobian of the transformation 


A.6 Monte-Carlo simulation 


e P. J. Nahin Digital dice, (Princeton, 2008). 


There are situations when exact solutions to probability questions are 
very difficult to calculate, or perhaps the approach to a solution is not 
clear. Often, Monte Carlo simulations may be the only solution strategy 
available, providing a straightforward method to gain some insight into 
the behaviour of the system. 

The main idea is to use a random number generator and a program 
loop to estimate the averaged outcome of a particular experiment. For 
example, if I want to know the average number of times I will get 4 heads 
in a row flipping a fair coin, I could sit down and flip a coin 4 times, 
recording the outcome each time, and repeat this hundreds or thousands 
of times, then take the number of times that 4 heads appear in my list, 
and divide this by the number of 4-flip trials I have done. The larger 
the number of trials, the better my estimate for the average will be. The 
data in the table below comes from a random number generator and a 
for loop in place of flipping an actual coin. For comparison, the Matlab 
script used to generate the data is also shown (Table A.5). 

A more sophisticated example comes from trying to estimate the value 
of an integral using Monte Carlo methods. Suppose we would like to 
estimate the area of a quarter-circle of radius 1. One approach is to 
randomly generate points in the 1 x 1 square in the first quadrant, and 
count how many of these lie inside the circle (i.e. x? + y? < 1) divided by 
the number of total points generated. We then assume that fraction is an 
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A) 
Estimate of 
Nzinput(‘How many trials?’); patty probability to 
4 flip 4 heads 
number_of_4 heads=0; 
for i=1:N 10° 0.061000 
fi=rand; 
f2=rand; 104 0.063300 
f3=rand; 
f4=rand; 10° 0.062140 
if f71>0.5 && f2>0.5 && £3>0.5 && f450.5 
number_of_4_heads=number_of_4_heads+1; 106 0.062643 
end 
me 107 0.062450 


sprintf('Average number of 4 heads tossed: 


%0.6f',number_of_4_heads/N) 0.062500 


Figure A.5: A) Matlab routine to simulate a four-toss experiment, record- 
ing the number of times 4 heads are flipped. The only idiosyncratic com- 
mands are rand which calls a unit uniform random number generator 
and sprintf which displays the output as a formatted string, in particular 
the %0.6f command ensures that the output is displayed with 6 decimal 
points. B) Output simulation data. As the number of trials increases, 
the Monte Carlo estimate approaches the analytic value. 
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Figure A.6: The fraction of randomly generated points lying inside the 
quarter circle provide an estimate of the area. 


adequate representation of the fraction of the area of the 1 x 1 square that 
lies within the quarter-circle (Figure A.6). Again, a table of data and the 
routine used to generate that data are shown below (Table A.7). More 
challenging applications of Monte Carlo methods appear in the exercises. 


A.7 Bertrand’s Paradox 


The classical theory of probability is — from a mathematical point of view 
— nothing but transformations of variables. Some probability distribution 
is given a priori on a set of elementary events; the problem is then to 
transform it into a probability distribution for the possible outcomes each 
of which corresponds to a collection of elementary events. For example, 
when 2 dice are cast there are 36 elementary events and they are assumed 
to have equal a priori probability; the problem is to find the probability 
for the various totals by counting the number of elementary events that 
make up each total. 

Mathematics can only derive probabilities of outcomes from a given a 
priori distribution. In applications to the real world, one must therefore 
decide which a priori distribution correctly describes the actual situation. 
(This is not a mathematical problem.) In problems of gambling, or balls 
in urns, the correct choice (or at least the one meant by the author) is 
usually clear, so it is frequently not stated explicitly. This has led to 
the erroneous view that pure mathematics can conjure up the correct 
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A) Estimate of 
Number ef area of quarter 
N=input('How many points?’'); points, N ane 
pts_inside=0; 0.805000 
for i=1:N 
pi=rand; z 
porand) 0.786300 


if p142+p242 <=1 
pts_inside=pts_inside+1; 


0.783310 


end 
bha 0.785315 
sprintf('Area of the quarter circle: %0.6f", 0.785627 
pts_inside/N) 


m4 0.785398 


Figure A.7: A) Matlab routine to simulate points inside the 1 x 1 square, 
recording the number of points laying inside the quarter circle of radius 
1. B) Output simulation data. As the number of points increases, the 
Monte Carlo estimate approaches the analytic value. 


probability for actual events to occur, with an enormous literature of 
semi-philosophical character. For a review, see for example J. R. Lucas 
“The concept of probability,” Clarendon, Oxford 1970. 

One attempt to avoid an explicit assumption of the a priori probability 
distribution is the so-called Principle of Insufficient Reason. It states that 
two elementary events have the same probability when there is no known 
reason why they should not. At best, this can be considered a working 
hypothesis, no philosophical principle can tell whether a die is loaded or 
not! 

The danger of intuitive ideas about equal probabilities has been beau- 
tifully illustrated by Bertrand (Calcul des probabilités, Gauthiers-Villars, 
Paris 1889): 


Take a fixed circle of radius 1, and draw at random a straight 
line intersecting it. What is the probability that the chord has 
length > \/3 (the length of the side of the inscribed equilateral 
triangle)? 


Answer 1: Take all random lines through a fixed point P on the edge 
of the circle (Figure A.8). Apart from the tangent (zero probability), all 


such lines will intersect the circle. For the chord to be > ,/3, the line 


must lie within an angle of 60° out of a total 180°. Hence, p = z 
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Figure A.8: A circle of radius 1, with an inscribed equilateral 
triangle with sides of length \/3. 


Answer 2: Take all random lines perpendicular to a fixed diameter. 
The chord is > ./3 when the point of intersection lies on the middle half 
of the diameter. Hence, p = $. 


Answer 3: For a chord to have length > ,/3, its center must lie at a 
distance less than $ from the center of the circle. The area of a circle of 


radius $ is a quarter of that of the original circle. Hence, p = i 


Each solution is based upon a different assumption about equal a priori 
probabilities. The loose phrase at random does not sufficiently specify 
the a priori probability to choose among the solutions. 


Suggested References 


Much of this chapter was taken from, 


e Stochastic Processes in Physics and Chemistry (2nd Ed.), N. G. van 
Kampen (North-Holland, 2001), 


and 


e Markov Processes, D. T. Gillespie (Academic Press, Inc, 1992). 
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Gillespie’s book is particularly useful if one wants to develop practical 
schemes for simulating random processes. In that direction, 


e An Introduction to Stochastic Processes in Physics, D. S. Lemons 
(Johns Hopkins University Press, 2002). 


is also very useful. 


Exercises 


1. Flipping a biased coin: Suppose you have a biased coin - showing 
heads with a probability p 4 1/2. How could you use this coin to 
generate a random string of binary outcomes, each with a probabil- 
ity of exactly 1/2? 


2. Characteristic functions: 


(a) 


Derive the characteristic function ¢(w) for the normal distri- 
bution N(1, 07). Use this characteristic function to show that, 
for two independent normal random variables, 


aN (111,07) + ON (2,03) = N(apn + be, a7 of + 6703). 


Use the characteristic function to show that for an arbitrary 
random variable the third-order cumulant «3 is expressed in 
terms of the moments [1, 2, and ps3, as follows, 


K3 = [3 — B8pgpa + 2p}. 


Use the characteristic function for an arbitrary random vari- 
able to prove the weak version of the central limit theorem 
— That is, show that for a series of independent, identically 
distributed random variables X; with mean pw and standard 
deviation o, the characteristic function of the random variable 


n oJn ) 


converges to the characteristic function of the unit normal dis- 
tribution N(0,1) as n > oo. Hint: Let 


(A.36) 
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and write out the Taylor series for the characteristic function 
of Z;. Use this to derive an expression for the characteristic 
function of Y, in the limit n > oo. 


(d) Write a Monte Carlo simulation to verify the central limit the- 
orem. Start with any kind of nonlinear transformation of a 
unit uniform random number, and plot a histogram of the dis- 
tribution of the sum Eq. A.28. What do you notice for small 
n? Compare the histogram to a unit normal distribution — 
beyond what values of n are the plots indistinguishable? 


3. Multivariate Gaussian distribution: Show that if the correla- 
tion matrix C is diagonal (C;; = 0; 7 # 7), then x is a vector of 
independent Gaussian random variables. 


4. Generating random numbers: Very powerful and reliable schemes 
have been developed to generate unit uniform random numbers. But 
we often want random numbers drawn from more exotic distribu- 
tions. 


(a) Write the generation formulas for x drawn from U(a,b) and 
C(u,c) in terms of the unit uniform variable r. Do you foresee 
a problem trying to do the same for z drawn from N(, 07)? 


(b) Given the discrete probability distribution P(n), show that for 
r drawn from a unit uniform distribution U(0, 1), the integer 
n satisfying, 


n—1 


ye P(n')<r< SS P(n’), 


n!=—0o n!=—0o 


is a realization drawn from P(n). Hint: What is the probabil- 
ity thata<r<b,for0O<a<b< 1. 


5. Monte Carlo simulation: Often analytic solutions to probabil- 
ity questions are prohibitively difficult. In those cases, or simply 
to gain some intuition for the process, stochastic simulation is in- 
dispensable. Estimate the solution of the following problems by 
running a Monte Carlo simulation. 


(a) Suppose you have to match members of two lists — For example, 
a list of architects and their famous buildings. Each member 
of one list maps uniquely to a member of the other list. For 
a list of 15 members, what is the average number of correct 
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connections that will be made if each member of one list is 
randomly connected to another member of the other list (in a 
one-to-one fashion)? How does this average change for a list 
with 5 members? With 100 members? 


Suppose you have a dozen eggs. You take them out of the 
carton and wash them. When you put them back, what is 
the probability that none of the eggs end up in their original 
location? What if you have 100 eggs? Do you recognize the 
reciprocal of this probability? 

The probability that after permutation no member returns to 
where it began is called the derangement probability, and was 
computed explicitly for an arbitrary number of elements by 
Euler (1751) using a very crafty argument. 


Two (independent) bus lines operate from a stop in front of 
your apartment building. One bus arrives every hour, on the 
hour, the other an unknown, but constant x fraction of an hour 
later (where, for lack of additional information, we assume x 
is a unit uniform random variable). What is the average wait 
time for a rider arriving at the stop at random? What about 
the case for n independent lines running from the same stop? 
By running simulations for different n, can you speculate about 
a general formula for all n? 


6. Change of variables: To a large extent, classical probability the- 
ory is simply a change of variables. The following test that notion. 


(a) 


Ballistics: (van Kampen, 2001) A cannon shoots a ball with 
initial velocity v at an angle 6 with the horizontal. Both v 
and @ have uncertainty given by Gaussian distributions with 
variance a? and 3, centered on vo and 6p, respectively. The 
variance is narrow enough that negative values of vy and 6 can 
be ignored. Find the probability distribution for the distance 


traveled by the cannon ball. 


Single-slit diffraction: (Lemons, 2002) According to the 
probability interpretation of light, formulated by Max Born 
in 1926, light intensity at a point is proportional to the prob- 
ability that a photon exists at that point. 


i. Imagine shooting marbles through a slit so that each angle 
of forward propagation @ is the uniform random variable 
U(0, 7/2) (see Figure A.9).What is the probability density 
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Figure A.9: Single-slit diffraction. From Lemons (2002), p. 30. 


f(x) that a marble passing through a narrow slit will arrive 
at position (a,x + dx) on a screen parallel to and at a 
distance d beyond the barrier? 

ii. Photons of light do not behave like marbles. The light 
intensity produced by diffraction through a single, narrow 
slit, as found in almost any introductory physics text, is 
proportional to 


1. sin? [(7a/X) sin 6] 


I« 


r sin? 6 , 

where r is the distance from the center of the slit to an arbi- 
trary place on the screen, a is the slit width, and A the light 
wavelength. Nonetheless, show that for slits so narrow that 
ma/\ <1, the above light intensity is proportional to the 
marble probability density derived in part 6(b)i. 


(c) Linear noise approximation: If we define the random vari- 
ables n(t) by the linear (though time-dependent) transforma- 
tion, 


ng = Oa; (t) + O20, (eee lo? ere ee 


where @ is a random variable with density H(a,t), Q is a 
constant, and x(t) is a deterministic function of time, then 
show that the density P(n,t) is simply, 


P(n,t) = 2N/?TI(a,t). 


ern B 
—__—_———_4 


REVIEW OF MATHEMATICAL METHODS 


There are several mathematical methods we return to again and again in 
this course. In the main notes, it is assumed that the reader is familiar 
with matrix algebra, linear stability analysis Laplace/Fourier transforms, 
and some elementary results from asymptotic analysis. As a refresher, 
more background is provided in this appendix. 


B.1 Matrix Algebra and Time-Ordered Ex- 
ponentials 


The main application of matrix algebra in this course will to codify a 
system of differential equations in a compact way. As such, most of the 
methods outlined in this appendix will be written in vector-matrix nota- 
tion. 


Matrix form of differential equations 


A system of high-order differential equations can always be written as a 
larger system of first-order differential equations by the simple expedient 
of introducing new state variables for each of the derivatives. For example, 
the equation describing simple harmonic motion can be written either as 
a second-order differential equation or as a 2 x 2-system of first-order 
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differential equations— 


dx 2 d|{ax 0 1 x 
2 enoe{(2]-[ 9% 8][2], on 


where the first line of the matrix equation defines the new state variable 
z. In many ways, the matrix form of the equation is preferable since 
solution methods immediately generalize to higher-order systems. For a 
fully nonlinear system, where x € R”, 
x = F(x) (B.2) 
—x = F(x), : 
dt 
where F = (F\(x), Fo(x),..., Fn(x)) is the reaction rate vector. There is 
no general solution for such systems. If, however, the reaction rate vector 
is linear in the state variables, there are formal solutions. 


Autonomous linear differential equations 


Autonomous means that the coefficients appearing in the differential equa- 
tion are constant. For a homogeneous (i.e. zero right-hand side), the 
system reads, 


dx 


EA x=9 x(0) = Xo. (B.3) 


In direct analogy with the one-dimensional system, the fundamental so- 
lutions of this equation are simply exponentials, albeit with matrix argu- 
ments. We define the matrix exponential as, 


1 
expA=I+A+ZA7+...=)7 —. (B.4) 


Here, I is the identity matrix, 


1 0 0 

0 1 0 0 
[I= 

0 0 -. 0 

00 0 1 


With the matrix exponential, the formal solution of Eq. B.3 is simply, 


x(t) = xp - exp [At]. (B.5) 


284 Applied stochastic processes 


Time-order exponentials for non-autonomous differential equa- 
tions 


e R. Feynman (1951) “An operator calculus having applications in 
quantum electrodynamics,” Physical Review 84: 108 — 128. 


e N. G. van Kampen (1974) “A cumulant expansion for stochastic 
linear differential equations. 1,” Physica 74: 215-238. 


Non-autonomous means the coefficients in the differential equation are 
no longer constant. Consider, for example, the differential equation with 
time-dependent coefficients, 


d 

—Y 

dt 
By iteration, the solution can be expressed as an infinite series, 


YQ) =I+ a A(ty)dty + i [ A(t1)A(tg)dtodt; +... (B.6) 


tea 
0 


=> [a ie dty-- [ * dtp A(t) A(te):-+A(ta).  (B.7) 


If the matrices A(t,,) commute with one another, this series can be sim- 
ply written as an exponential of an integral (as in the autonomous case 
discussed above), 


Y(t) = exp | | “Achat ‘ 


co 1 t t t 
val ar | diy f dt, A(t,) A(t2)--- A(tn), 
nao | JO 0 0 


where the 1/n! is included to accommodate the extended domain of inte- 
gration from [0, ¢] for each iterate. If the matrices A(t,) do not commute 
with one another (as is typically the case), then a time-ordering opera- 
tor [ | is used to allow the infinite series, Eq. B.7, to be written as an 
exponential, 


tye 
0 


Y(t) => [an cs dty-- f * dt A(ty) A (to) «++ A(tn) 


= exp if Acta | ; (B.8) 


1 Equations of this sort cannot be readily solved using Laplace or Fourier transform 
methods. 
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Many equations, for example the Mathieu equation discussed in Sec- 
tion 8.4, can be written formally as a time-ordered exponential, although 
the explicit solution cannot be written in terms of elementary functions. 


B.2 Linear Stability Analysis 


e L. Edelstein-Keshet Mathematical models in biology, (SIAM, 2005). Chapter 4. 


Suppose we have the nonlinear system of differential equations, 


d 
ux F(x). (B.9) 


We can determine the equilibrium, or fixed, points of this system by 
looking for values of the state variables such that, 


dx 


dt ee) 
or, equivalently, those values of x that satisfy the condition, 
F(x*) = 0. (B.10) 


Once these equilibrium points have been determined, either analytically 
or numerically, the question arises as to whether or not they are stable in 
the sense that small perturbations away from x* return to x*. 


Eigenvalues and the fundamental modes 


For very small perturbations x,, the local dynamics near to x* can be 
adequately described by a linearization of the rates. Starting from the 
governing equation, 


we use the Taylor series to approximate F for small x,, 
F(x* + xp) © F(x*)+J-x,p =J-xp, 
where J is called the Jacobian or response matrix of F, 


_ OFi(x*) 


Jig 
J Ox; 
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The dynamics of the small perturbation modes x, are then governed 
by the homogeneous linear equation, 


— =J-x». (B.11) 


For a one-dimensional system, we know that the solution is an exponential 
Lp, « expAt. Assuming the same holds true in multiple dimensions, we 
make the ansatz, 


x (= ve". (B.12) 


Substituting into both sides of Eq. B.11, and cancelling e*’, we arrive at 
the constraint on A, 


Av =J-v. (B.13) 
Or, writing Av = AI - v, 
[AI — J]-v =0. (B.14) 


Non-trivial solutions, v 4 0, are only possible if the matrix on the left- 
hand side is not invertible, 7.e., 


det [AI — J] = 0. (B.15) 


This is called the resolvent equation for the linear operator J, and values 
A that satisfy this condition are called the eigenvalues of J. The impor- 
tance of the eigenvalues comes from what they tell us about the long-time 
stability of the perturbation modes xp. Since we have xp « e**, if any of 
the eigenvalues have a positive real part, the perturbations will grow and 
we say that the equilibrium point x* is unstable. 


B.2.1 Routh-Hurwitz Criterion 


e L. Edelstein-Keshet Mathematical models in biology, (SIAM, 2005). pp. 233- 
234. 


Notice that the resolvent equation, Eq. B.15, is a polynomial in \ with 
degree equal to the dimensionality of our system — That is, if J €¢ R” x R”, 
then the resolvent equation is an n*”-order polynomial. It would be nice, 
(particularly when the order is greater than 4 or some of the coefficients 
are free parameters), if we had a condition for stability that did not 
require that the \ be computed explicitly. The Routh-Hurwitz criterion 
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provides just such a test, determining the stability from the coefficients 
of the resolvent equation without explicit computation of i. 

Suppose our resolvent equation is the k’”-order polynomial, 


baer ie uae See = (0s (B.16) 


then define the following matrices, 


1 
Hy = [a1], He=| 4 lees 


0 ag 
a, 1 0O vee 0 
H,=| % 2 % pa hp Set ® (B.17) 
: 0 
0 O ar, 


where the (/,m) element of the matrix H, is 


Qaim for 0<2l-m<k, 
1 for 2l=m, (B.18) 
0 for otherwise. 


It looks cumbersome written out for a general system, but it is easily 
coded into symbolic mathematics packages like Maple, Mathematica or 
Matlab. The stability criterion is the following, 


A necessary and sufficient condition for all of the roots of 
the resolvent Eq. B.16 to have negative real parts is that the 
determinants of the matrices Hy are all positive (> 0). 


For characteristics polynomials of low-degree, the Routh-Hurwitz cri- 
teria for stability can be stated simply and explicitly. 


For a characteristic equation of degree n, 
NaN aa iN by = 0, 


the eigenvalues all have negative real parts if, and only if, 
For n = 2: ay > 0, a2 > 0. 

For n = 3: ay > 0,43 > 0; a1az2 > as. 

For n = 4: ay > 0,a3 > 0, a4 > 0; ayaz2a3 > az + aza4. 
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B.3 Pattern-forming instabilities 


For spatially-dependent models characterized by partial differential equa- 
tions, instabilities can manifest themselves in a variety of interesting ways 
including static pattern formation, traveling waves and variations on these 
themes. 


B.3.1 Turing-type instabilities 


e A.M. Turing (1953) Philosophical Transactions of the Royal Society - London 
B 237:37. 


Pattern formation is ubiquitous in physical, chemical and biological 
systems. One mechanism through which they can arise is due to the evo- 
lution of a deterministically unstable steady state. For example, Turing 
showed that for a reaction-diffusion model tending to a homogeneous equi- 
librium state, diffusion can act to destabilize the steady solution. More- 
over, the system becomes destabilized to only a certain range of spatial 
modes, leading to the emergence of regular patterning. 

In a strictly deterministic system, the local stability is characterized 
by the evolution of some small perturbation, x,, about the equilibrium 
state. For sufficiently small amplitudes, the perturbation field obeys the 
linearized mean-field equation, 


Ce 


at =A-x,+D-V°xp, 


where A is the Jacobian of the reaction dynamics and D is the diffusion 
matrix. Taking the Laplace and Fourier transforms in time and space, 
respectively, the stability of the equilibrium state is determined by the 
resolvent equation, 


det [AI— A + kD] =0. 


The equilibrium is asymptotically stable if Re[A] < 0. Even though A 
may be stable (i.e. the eigenvalues of A lie in the left-half of the complex 
plane), Turing’s claim is that for a certain class of diffusivity matrices 
D and a range of wavenumbers k, the roots of the resolvent equation A 
are shifted to the right-hand side of the complex plane and the system 
becomes unstable. 

The analysis is made more transparent by considering a two-species 
model in one spatial dimension. For a two-species model, the criteria for 
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A to have stable eigenvalues are, 


trA <0> a4, +422 < 0, and 
det A > 0 > aj1G22 — Gj2a21 > 0. 


The presence of diffusion introduces a re-scaling of the diagonal elements 
of A, 


- 2 
a4. > G41 = a1, — Dik’, 


22 — G22 = a2 — Dok”. 
The conditions for stability then become, 


tr [A — kD] <0, 
det [A — k?D] > 0. 


For diffusion to destabilize the steady-state, it must be that, 
det [A — k*D] <0 = @1422 — ay2a21 < 0, (B.19) 


since the condition on the trace is automatically satisfied. The above 
equation can be written explicitly as a quadratic in k?, 


Dy a22 + Doar 42 4, C1422 — G12421 


<0. 
D, Dz D, D2 


Q(k?) = kt 


A sufficient condition for instability is that the minimum of Q(k?) < 0. 
Setting the derivative of Q to zero, we arrive at an explicit expression for 


k2 ,, that minimizes Q(k?), 
oe Dy a2 + Dea 
min —— 2D, De 


Finally, the condition for Turing-type instability can then be written as, 


(Dy a22 + Doay1)” 


ke. det A 
Q(knin) <9 => detA < 4D, Ds 


(B.20) 


The range of parameter space for which A is stable, but Eq. B.20 is 
satisfied, is called the Turing space. 

The two major limitations associated with applications of Turing-type 
pattern-formation in the modeling of natural systems are that, 
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1. The difference in the diffusion coefficients D; and D2 must be quite 
large (Dz >> D,), typically at least an order of magnitude, to satisfy 
the necessary conditions for pattern formation. In reality, unless one 
of the species is immobilized, the diffusion coefficients are rarely very 
different. 


2. The instability that arises is periodic with some characteristic wavenum- 
ber close to kmin. In reality, spatial patterning in natural systems 
exhibit irregular patterning, leading to a distribution in the spec- 
trum spread across a range of wavenumbers. 


B.3.2 Differential flow instabilities 
B.3.3 Other mechanisms 


B.4 Transforms Methods 


e G. James, Advanced Modern Engineering Mathematics, (Prentice Hall, 2005). 


Laplace and Fourier transforms are a convenient method of exchanging 
the search for the solution of a differential equation (difficult) with the 
search for the solution of an algebraic equation (easier). The simplification 
of solving the problem in Laplace (or Fourier) space comes at the expense 
of a sometimes difficult (or impossible) inversion back to the original 
problem space. 

In the context of the present course, we will rarely be interested in the 
inverting of the Laplace or Fourier transform back to the original problem 
space since we will be using the properties of the transformed quantities 
directly - the poles of the Laplace transform will be used to determine the 
asymptotic stability (Section 8.5 on page 190 and Section 9.3 on page 208), 
and the Fourier transform of the correlation function is related to the 
power spectrum of the fluctuations (Section 2.3 on page 39). 


B.4.1 Laplace Transform 


The Laplace transform of the vector function f(t) « R” is given by the 
integration of the function multiplied by exp[sI ¢], 


LIE (t)|(s) = F(s) = / f (t) - edt, (B.21) 
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where I is the n xn identity matrix (for the transform of a scalar function, 
set n = 1). Our interest in the Laplace transform comes from two useful 
properties. First, using the definition of the transform and integration by 
parts, it is possible to derive the relation: 


rs | = sI- X(s) — x(0). (B.22) 
Then, for a system of linear differential equations, 
a =A-x; x(0)=xo, (B.23) 


the formal solution can always be expressed as a Laplace transform, 
X(s) = [sI— A]~*- xp. (B.24) 


The second useful property is that the Laplace transform of a convolution 


integral, 
t 


g(t) « f(t) = Js (t—7)-f£(7r)dr (B.25) 
0 
is simply the product of the individual Laplace transforms, 


Llg(t) *£()] = £[g(t)] - £[f®] = G(s) - F(s). (B.26) 
In that way, the formal solution of the linear convolution equation, 


aint f'gt-axtriirs xl0)=%0, a 


X(s) = [sl — A— G(s)|-1- xo. (B.28) 


Asymptotic Stability 


e S. I. Grossman and R. K. Miller (1973) “Nonlinear Volterra integrodifferential 
systems with L1-kernels,” Journal of Differential Equations 13: 551-566. 


Notice from Eq. B.24 that [sI — A] is not invertible precisely at those 
values of s for which, 
det[sI — A] =0. (B.29) 


This equation, called the resolvent equation, likewise determines the eigen- 
values of the matrix A (Section B.2). Therefore, the solution of the dif- 
ferential equation Eq. B.31, is asymptotically stable (x(t) > 0 as t > ov) 
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if, and only if, those values of s that solve the resolvent equation all have 
negative real parts: 


det|sI — A] = 0 Re(s) < 0. (B.30) 


A more general theorem (Grossman and Miller, 1973) states that asymp- 
totic stability of the convolution equation, 
t 
dx 
“HT Axt g(t—7)-x(t)dr, (B.31) 
0 
is also guaranteed, provided the roots of the resolvent equation have neg- 
ative real parts, 


det[sI— A —G(s)]=0  Re(s) <0. (B.32) 
Finally, a useful feature of the Laplace transform is that the s > 0 limit 
provides the t > co limit of the un-transformed function x(t), 


lim x(t) = lim s- &(s). (B.33) 


too s—0 


B.4.2 Fourier Transform 


The Fourier transform is very similar to the Laplace transform, but now 
with an explicitly complex argument iw and an unbounded domain of 


integration, 
Co 


FROIN PGA S / f(t) "at, (B.34) 
The Fourier transform, in contrast to the Laplace or z-transform, is easily 
inverted. That is, 


1 caer : 
f(t) = — / F(iw) - ec" dw. 
27 Joo 


Up to a factor of 27, the Fourier transform is its own inverse. Furthermore, 
note the duality property of the Fourier transform, 


F[F (it)| = 2nf(-w). (B.35) 


The Fourier transform can be thought of as a projecting of the fre- 
quency content of a signal f(t) onto the basis functions coswt and sin wt, 
for a continuous distribution of frequencies. To illustrate this property, 
we use the Dirac delta function to derive generalized Fourier transforms. 
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Dirac delta function 


The Dirac delta function 4(t) is a distribution with the following sifting 
property, 


[ ses - oat = 5 fora<c<b. 


What happens for c = a or c = b is a matter of convention — If d(¢) is 
defined as the limit of a sequence of functions, for example, 


es 
Jim e = 0(T), 


then it follows that, 


b 
1 
i f(H)d(t — c)dt = af le) c=a,b. 
a 
One can show that this convention is consistent with Stratonovich’s in- 


terpretation of the nonlinear Langevin equation. It6’s interpretation, on 
the other hand, is consistent with the convention, 


[rose-oa= {10 =), 


For the purposes of deriving generalized Fourier transforms, these distinc- 
tions are immaterial, as we will be considering an unbounded domain of 
integration. 


Generalized Fourier transforms 


By the sifting property, 
F[6(t)] = ‘ - de d= 1, (B.36) 
and 
Flat #)= / ” 5 (t— to) end = eto, 


In and of themselves, these transform pairs are unremarkable. By using 
the duality property (Eq. B.35), however, we arrive at the decidedly more 
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exotic pairs, 
1 = > 27d(w), 
ot 2s Ind(w — wo). 
The trigonometric functions coswt and sinwt can be written as a sum 
of complex exponentials, leading to the following Fourier transforms, 
F [cos wot] = 1 [5(w — wo) + d(wW + wo)] , 
F [sin wot] = im [6(w + wo) — 6(w — wo)]. 
The Fourier transform can therefore be thought of as a portrait of 
the frequency content of the signal f(t). A peak at some frequency wo 
indicates a dominant oscillatory component in the signal. Conversely, a 


flat spectrum indicates very little structure in the signal (as is the case 
for white noise). 


B.4.3 2-Transform 


The z-transform is essentially a discrete form of the Laplace transform. 
Formally, given a sequence {x;}°. of complex numbers 2, the z-transform 
of the sequence is defined as 


2 {i} = X@)= OF 


whenever the sum exists. 
A sequence is called causal if x, = 0 for k < 0. For a causal sequence 
then, the z-transform reduces to, 


Z {a}%, =X(z)= >> =. 


The z-transform is useful to solve linear difference equations, such as 
one finds in trying to solve for the equilibrium probability distribution of a 
random walk on a discrete lattice. It also appears in the guise of moment 
generating functions again used to solve linear difference equations. 


B.5 Partial differential equations 


Partial differential equations are equations characterizing the behaviour 
of a multivariable function in terms of partial derivatives. They are no- 
toriously difficult to solve, particularly if the equation in nonlinear in 
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the function of interest. For the purposes of this course, we will only 
consider linear partial differential equations that will be susceptible to 
solution through a variety of simple methods. Fro details, consult a text 
devoted to the solution of partial differential equations, for example the 
reasonably-priced 


e S. J. Farlow, Partial Differential Equations for Scientists and Engi- 
neers, (Dover, 1993). 
B.5.1 Method of characteristics 


Notice that the chain-rule applied to the function u(«(s), t(s)) of param- 
eterized curves (a(s), t(s)), 


du Oudx “ Ou dt 
ds Oxds Ot ds’ 


implies that the first-order partial differential equation 


0 7) 
a(x, tu) + W(a,t, ws = c(a,t,u), 
can be written as a system of ordinary differential equations, 

d 

Fe = Ue(s)s#(8),u(s)), 

dt 

de 7 (a(s), H(s), uls)), 

du 


F = c(x(s),t(s), u(s)). 


The method of characteristics is only applicable to first-order partial dif- 
ferential equations, although these occur in application when the moment 
generating function is used to transform a master equation with linear 
transition rates (see Section 4.1.1). 


B.5.2 Separation of variables 


Partial differential equations with particular underlying symmetry gov- 
erning the function u(x,t) can be transformed to a coupled system of or- 
dinary differential equations by assuming a solution of the form u(x,t) = 
X(x)T(t). For example, the diffusion equation, 


au _ du 
Ot Ox?’ 
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is separable. Substitution of a solution of the form u(x,t) = X(«)T(t) 
yields two ordinary differential equations for X(a) and T(t). It was in 
this context that Fourier derived his famous series. 


B.5.3 Transform methods 


The transform methods, Laplace and Fourier, because they convert deriva- 
tives into algebraic expressions involving the transformed functions, are 
particularly powerful methods for solving partial differential equations. 
The limitations are that they only work if the coefficients are constant in 
the dependent variable and inversion back to the original problem space 
can be difficult (or impossible). Typically, Laplace transforms are used 
to transform variables with domain defined on the positive half-line - for 
example, Laplace transform of u(x,t) with t > 0 yields an ordinary dif- 
ferential equation for U (a, s). Fourier transforms, on the other hand, be- 
cause of the unbounded support of the transform integral, are used when 
the function of interest has an unbounded domain - for example, Fourier 
transform of u(x,t) with —co < a < oo yields an ordinary differential 
equation for U(w, t). 


B.6 Some Results from Asymptotics and Func- 
tional Analysis 


Occasionally, use is made of some important results from functional anal- 
ysis and asymptotic analysis. We will use only basic results that can be 
proved without much effort. 


B.6.1 Cauchy-Schwarz Inequality 


If f(t) and g(t) are any real-valued functions, then it is obviously true 
that for some real constant  € R, 


U 
q (f(t) + g(t) }2at > 0, 


since the integrand is nowhere negative. Expanding this expression, 


U U U 
d f g2(t)dt + 2X i f(t)g(t)dt + i f?(t)dt > 0. (B.37) 
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For fixed limits of integration U and L (possibly infinite), the value of 
each integral is simply a constant, call them a, b, and c. Eq. B.37 reduces 
to, 


h(A) = ad? + 2A+c>0, (B.38) 
where, 
U 
a= f gar 
L 
U 
b= (t)g(t)dt, 
L 
U 
c=] fr(t)dt 
L 


The condition h(A) > 0 means a plot of h(A) must lie above the A-axis 
and cannot cross it. At most, it may touch the axis, in which case we 
have the double root \ = —b/a. For h(A) above the A-axis, we must have 
that the roots of h(A) are imaginary. From the quadratic formula, we 
then have, 


h(A) > 04 0? < ae. (B.39) 


In terms of our original integrals, 


{pe piattyaeh” < {pF prmat} {pe gael. (B.40) 


This is the Cauchy-Schwarz inequality. 


B.6.2 Watson’s Lemma 


For 0 < b < ~, with, 
f(t) ~ folt—to)* + filt—to)®, -l<a<f, (B.41) 


as t + to, the estimate for the following integral holds, 


b 
| floeceat ~ fo rare) + fi seed 
0 


as © —> oo. (B.42) 
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More generally, for -co <a <b < o, and, 
g(t) ~ go st git = to)”, A> 0,91 # 0, (B.43) 


the estimate for the following integral holds, 


b ita 


f fe dt ~ 2foT (+42) (+) * e-®90, ag x 4 00. (B.44) 


where the minimum of g(t) occurs at to (a < to < 6). If the minimum 
to should fall on either of the end points a or b, then the factor of 2 no 
longer appears; 7.e. for t) = a or tp = J, 


lta 


b = -. 
jf s@et#at ~ fp (+42) ( : ) e *9, asax—oo. (B.45) 


TI. 


B.6.3 Stirling’s Approximation 


Stirling’s approximation is: 


nls V2rn-n"e~” forn> 1 (B.46) 


which is incredibly accurate. To justify the expression, we write the fac- 
torial in its equivalent form as an improper integral, 


n! = f oetae. (B.47) 
0 


Note that the integrand F(a) = x"e~* is a sharply-peaked function of 
x, and so we seek an approximation of the integrand near the maximum. 
Actually, F(a) is so sharply-peaked that it turns out to be more con- 
venient to consider the logarithm In F(x). It is straightforward to show 
that In F(a) (and hence, F(x)), has a maximum at = n. We make the 

change of variable « = n+ ¢, and expand In F(a) in a power-series in ¢, 
nF =nlnz-—2=nIn(n+e)—(n+e) (B.48) 

1 2 

nF x ninn-n—- ae, 

2n 


Or, in terms of the original integrand, 


2 


Fane "e ®, (B.49) 
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From the integral for the factorial, Eq. B.47, 
co lo) 
e2 e2 
nls [mene fae ane” ip e 2 de, (B.50) 
—n —oo 
where we have replaced the lower-limit of integration by —co. The remain- 


ing integral is simply the integral of a Gaussian probability distribution. 
Therefore, 


n 


nix V2nnn"e" for n> 1, (B.51) 


as above. 


B.6.4 Eigenvalue perturbation 


Suggested references 


One of the finest books on general mathematical methods is 


e M.L. Boas, Mathematical Methods in the Physical Sciences, (Wiley, 
2005). 


It has gone through several editions, and no harm comes from picking up 
a cheap used copy of an earlier edition. 
For asymptotic and perturbation methods, 


e C. M. Bender and S. A. Orszag, Advanced Mathematical Methods 
for Scientists and Engineers: Asymptotic Methods and Perturbation 
Theory, (Springer, 1999), 


is excellent. It contains a huge survey of methods by masters of the field. 
It, too, has gone through various editions and can be picked up used. The 
short book by Hinch, 


e E. J. Hinch, Perturbation methods, (Cambridge University Press, 
1991), 


provides a more unified presentation than Bender and Orszag, and may 
be more useful for self-study. 
from the point of lurking in the system, and no effect on the 


APPENDIX C 


BT 


ITO CALCULUS 


Itd’s calculus is used very commonly in financial applications of stochastic 
processes, presumably because it allows very concise proofs of bounded- 
ness, continuity, etc., that would be difficult or impossible using Strantonovich’s 
interpretation of a white-noise stochastic differential equation. A very 
useful formula developed by It6 is his rule for a change of variables. We 

will derive the change of variable formula and other related properties of 

the It6 stochastic integral in this brief appendix. 


C.1  It6’s stochastic integral 


We would like to assign a meaning to the integral [, ig G(t')dW (t'), where 
dW(t) is the increment of a Wiener process, and G(t) is an arbitrary 
function. Following the formulation of the Riemann-Stieltjes integral, 
we divide the domain into N subintervals [to, t1] U[ti, te] U---U[tn—1, t] 
and choose intermediate points 7; such that 7; € [tj;-1,t;]. The integral 
a G(t')dW (t’) is then defined as the (mean-square) limit of the partial 
sum, 


Sn = Ds G(7;) [W (ti) — W(ti_1)]- 


as n — oo. In contrast with the Riemann-Stieltjes integral, however, the 
limit depends upon the choice of intermediate point 7;! One can show that 
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the two interpretations of the stochastic integral discussed in Section 7.4.1 
correspond to particular choices of 7; — For the It6 interpretation, 7; is the 
initial point in the interval 7; = t;_1, while the Stratonovich interpretation 
amounts to choosing 7; as the mid-point 7; = nthe 


C.1.1 Example — iw t')dW (t’) 


Writing W; = W(t;) and an = W; — W,-1, in the It6 interpretation 
(7 =%-1), 


on. = ss Wi-1AW; = s> (Winn LAW, = (Wea) =(AW,)* 
[W*(t) — W* (to) | }-5 aw. (C.1) 


The mean-square limit of the last term can be computed: Since 


(x aw?) =t—to, (C.2) 
we find, . 
bs (W; — Wi-1)? — (t- 5) ’ = 2» Get) SO). (C3) 
as n> > Or, equivalently, . 
ra W(t')dW(t') = ; [W?(t) — W(to) — (E-to)] (ITO) (C.4) 


In the Stratonovich interpretation, one can show that (Exercise 2), 


4 


= ; [W2(t) — W2(t)]  (STRATONOVICH) (C5) 


For an arbitrary function G(t), the Stratonovich integral has no relation- 
ship whatever to the Ité integral, unless, of course, G(t) is related to a 
stochastic differential equation, then there is an explicit relationship as 
explained in Section 7.4.1. 
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C.2 Change of Variables Formula 


Computations using It6d’s calculus are greatly facilitated by the following 
identities, 


dW?(t)=dt and dW?*%(t)=0 (N>0). 


That is, written more precisely, 


fow [dw @yPr® = | pew) dt!’ N= 0, 
: 0 N>0. 


These relations lead to a generalized chain rule in the It6 interpretation. 
For example, 


d{exp [W(t)]} = exp [W(t) + dW(t)] — exp [W(t)] 


= exp |[W(t)] law (t) + sav) 


= exp [W(d)] ave ip st 


Written in general, for an arbitrary function f [W(#)], 


a eZ 
df (W(t), #] = & +5 a 


Suppose we have a function X(t) characterized by the Langevin equa- 
tion, 


2T awit). (C.6) 


dX (t) = a(X,t)dt + b(X, t)dW(t), (C.7) 


and $(X) is a function of the solution of this stochastic differential equa- 
tion. Using the generalized chain rule, Eq. C.6, and the Langevin equa- 
tion, Eq. C.7, we obtain It6’s change of variables formula: 


do(X(t)) = (XE ae ))- o(X@) = 
$'(X(t)) dX (t) + 5 sox (t)) [aX ()P + 
se (t)) [a(*(), ‘dt + x (t), t) dW (t)] 
+3 OX )) b( X(t), t) |€W a)? + 


~ faeces ol(X(H) + 50°(X(O, 8" XO) a 
+b(X (t), t) ¢'(X (t)) dW(t). (C.8) 
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Notice that if ¢(X) is linear in X(t), then the anomalous term, 


5 P(X(t).t) (XO), 


vanishes and the change of variables formula reduces to the chain rule of 
ordinary (Stratonovich) calculus. 


C.2.1 Example — Calculate (cos[7(t)]) 


Suppose we have white noise governed by the stochastic differential equa- 
tion 


dn =TdW, (0) =9, 


i.e. (n(t)) = 0 and (n(t)n(t’)) = T'26(t—t’), and we would like to compute 
(cos[n(t)]). Using It6’s change of variables or the generalized chain rule, 
we arrive at the equation governing cos[n(t)], 


T? 
dcosn = =p cos ndt + T sin ndW, 


since a(X) = 0 and b(X) =T in Eq. C.8. Taking the average, the last term 
on the right-hand side vanishes, leaving behind the ordinary differential 
equation 


2 


“eos = — =z (cos m), (cosn(0)) = 1. 


The solution is simply, 
ry 


(cos) =e" = 


Suggested References 


This chapter is taken from Gardiner’s Handbook, 


e Handbook of stochastic methods (3rd Ed.), C. W. Gardiner (Springer, 
2004), 


that contains a large collection of examples illustrating the range and 
applicability of Ito calculus. 
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Exercises 


1. Fill in the details in the derivation of Eq. C.4 from Eq. C.1. 
2. Prove Eq. C.5. 


3. Repeat the calculation of (cos[n(t)]) using the average of the cu- 
mulant generating function (Eq. A.10 on p. 263) instead of Itd’s 
change of variable formula. Could you use It6’s formula for Gaussian 
coloured noise, (i.e. noise with nonzero correlation time)? Could 
you use the cumulant generating function for Gaussian coloured 
noise? Repeat the calculation for (cos[F'(t)]) for the Ornstein-Uhlenbeck 
process F'(t) with autocorrelation function, 


4. Milstein simulation algorithm: Using It6’s change of variables 
formula, and neglecting terms of O(At?), show that (Eq. 8.7), 


At 
[felt 4) ~ e(y(0),0)} a7) = 
cku(0).0) -ey{y(0),0) [SR — SA) 


where y(t) obeys the Langevin equation (interpreted in the Ité 
sense), 


dy = A(y, t)dt + c(y, t)dW(t). (C.9) 
In that way, justify Milstein’s scheme for simulating Eq. C.9, 


y(t + At) = y(t) + Aly, t)At + n1 - c(y, t)WAt 


At 
—> ely, t) : Cy(y, t) q (1 = n>), 
where n; are independent samples of a unit Normal distribution 
N(O, 1). 


APPENDIX D 
a | 


SAMPLE MATLAB CODE 


In this chapter, there are several examples of code used to generate simu- 
lation data shown in the main notes. The routines are not fully annotated, 
and some familiarity with Matlab is assumed. If you are not familiar with 
Matlab, there are several good references, both print and web-based, in- 
cluding the following: 


e Numerical methods using MATLAB, J. H. Mathews and K. D. Fink 
(Pearson, 2004). 


e http://www.cs.ubc.ca/spider/cavers/MatlabGuide/guide.html 


The most conspicuous absence is a lack of fully-assembled code — Only 
small steps in the simulation algorithms are shown. This is partly to 
allow the reader to assemble full programs in their own style, and partly 
because full routines require many tedious lines of code for pre-allocating 
memory, plotting results, etc, that only detract from the presentation of 
the algorithms. 


D.1 Examples of Gillespie’s direct method 


e D. T. Gillespie (1977) “Exact simulation of coupled chemical reactions,” Journal 
of Chemical Physics 81: 2340. 
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D.1.1 Propensity vector and stoichiometry matrix 
For a system with N reactants and M reactions, we define two functions: 


1. The first is run at the initialization of the program, and returns the 
transpose of the stoichiometry matrix Smat (an M x N matrix). 


function Smat = defineReactions(N,M) 
%Generates the stoichiometery matrix 
Smat = zeros(M,N); 


% reactant it 2 eee 
Smat(1,:)  — { =f «. 6 
-] )) 


Smat (2,:)  _ | 
. 1 | 
1 | 


Smat (/,:) 


2. The second takes as an argument the present state of the system X 
and the system size OMEGA, returning a vector of the reaction rates 
Vvec (a M x 1 vector). The microscopic reaction rates v; are written 
in terms of the reactant concentrations (see p. 72). The OMEGA 
multiplying each rate v; converts the units from concentration per 
tame to per time. 


function vVec=vVecIn(X,OMEGA) 
% Function returning vector of M vVec which are functions of 
% N reactants X 


% Writing the reactions in terms of concentration 


Xc=X/OMEGA; 
vVec(1) = OMEGA *1(Xc) 


vVec(2) = OMEGA *12(Xc) 


vVec(M) = OMEGA «vyy(Xc) 


D.1.2 Core stochastic simulation algorithm 


Once the stoichiometry matrix and propensity vector are defined, the core 
simulation algorithm is straightforward. The current state of the system 
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X, the current time T, the stoichiometry matrix Smat and the system 
size Q are input, and the updated state and time, Xnew and Tnew, are 
output. 


function [Xnew, Tnew]=CoreSSA(X,T,Smat,OMEGA) 


[M,N]=size(Smat);% N reactants, M reactions 
%Step 1: Calculate amu & a_0 


amu = vVecIn(X,OMEGA); 


a0 = sum(a_mu) ; 


%Step 2: Calculate tau and mu using random number genera- 
tors 

% Tau is the *time* the next reaction completes and mu is the 
*index* of 

% the next reaction. 


rl = rand; 


tau = (1/a.0)*log(1/r1); 


r2 = rand; 
next_mu = find(cumsum(a_mu)>=r2*a_0,1, first’); 


%Step 3: Update the system: 
% carry out reaction next_mu 


prod = Smat(next-mu,1:N) ; % Stoichiomtery of update 
for i=1:N 
Xnew(i) = X(i)+prod(i); 


end 


% update the time 
Tnew = T + tau; 
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D.1.3. Example - the Brusselator 


The Brusselator example (Section 4.2) is a simple model that exhibits 
limit cycle and stable behaviour over a range of parameter values. The 
reactions are, 


da, 
Qe +y > 3a, 
b 
oY, 
xo. 
In the Gillespie algorithm, the propensity vector is coded as, 


function vVec=vVecIn(X,OMEGA) 

global a b% if your parameter values are declared as global 
variables 
x=X(1)/OMEGA;xm1=(X(1)-1)/OMEGA;y=X(2)/OMEGA; 


vVec(1) = OMEGA*(1) 
vVec (2) = OMEGA*(a*x*xml*y) 
vVec (3) = OMEGA*(b*x) 
vVec (4) = OMEGA*(x) 


The stoichiometry matrix is, 


function Smat = defineReactions(N,M) 
Smat = zeros(M,N); 
% reactant x y 
Smat(1,:) = 
Smat (2, :) 
Smat (3, :) 
(4,.) 


Smat 


? 


D.2 Stochastic differential equations 
The update step for a simple forward-Euler method is shown. Higher- 
order methods are possible — see 


e Handbook of stochastic methods (3rd Ed.), C. W. Gardiner (Springer, 
2004), 


for details and references. 
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D.2.1 White noise 


The Euler method for white noise is straight-forward, and since the white 
noise is uncorrelated, its history does not need to saved, in contrast with 
the coloured noise algorithm below, where a past value is needed. 

We are simulating a sample trajectory of the process y(t) characterized 
by the It6 stochastic differential equation, 


dy = A(y, t)dt + c(y,t)dW (t). 


The input is the present state of the system (y(t),¢) and the time-step dt. 
The output is the updated state (y(t + dt), t+ dt). 


function [yNew,tNew]=whiteEuler(y,t,dt) 


yNew=y+A(y,t)*dt+b(y,t)*dtA(1/2)*randn; 
tNew=t-+dt; 


D.2.2 Coloured noise 


We are simulating a sample trajectory of the process y(t) characterized 
by the stochastic differential equation, 


Y  Aly.t) + e(y,t)nl0) 


where 7(t) is coloured noise with unit variance and correlation time T.=tauC. 
The input is the present state of the system (y(t), t, 7(¢)) and the time-step 
dt. The output is the updated state (y(t + dt), t+ dt, n(t + dt)). 


function [yNew,tNew,nNew]=colouredEuler(y,t,n,dt) 


rho=exp(-dt/tauC); 

yNew=y-+a(y,t)*dt+c(y,t)*n*dt; 
nNew=rho*n+(1-rhoA2)A(1/2)*(1/2/tauC)A(1/2)*randn; 
tNew=t-+dt; 


